
Malloc envy - greenhathacker
https://engineering.vena.io/2016/04/04/malloc-envy/
======
amluto
> It's also important not to overstate the duration of GC pauses. Applications
> don't need to stop for the entire duration of a garbage collection cycle:
> they can continue to run and mutate objects (and even allocate objects), so
> long as they coordinate with the GC.

I'm not much of a Java user, nor do I use GC for any performance-critical
code, but I write code that interacts over the network with quite a few Java
systems. These systems have all kinds of performance problems, a lot of which
involve GC pauses.

Some of those systems come from vendors who have very competent teams and
invest heavily in their systems, and some have tried fancy, expensive JVMs
with supposedly better real-time performance and have failed to get them to
work.

So I'd offer a statement in the other direction: it's important not to
overstate the ease of avoiding GC performance gotchas. Maybe it can be done,
but that doesn't mean that you're going to be able to do it, especially if you
don't design all parts of your application from the beginning to operate
within the constraints of real-time GC implementations.

~~~
rbehrends
The JVM has a lot of other reasons to pause [1]. A big problem with using the
JVM as an example is that it's an enormously complex architecture that wasn't
really designed for the kind of workloads it is subjected to these days. For
example, the complexities involved in having a JIT compiler in a multi-
threaded environment that wasn't designed from the ground up for that purpose
can be pretty scary (e.g. the method deoptimization issue in the video). The
effort that has gone into the various JVM implementations both impresses and
scares me.

[1]
[https://www.youtube.com/watch?v=Y39kllzX1P8](https://www.youtube.com/watch?v=Y39kllzX1P8)

~~~
haberman
In what sense was the JVM not designed to be a JIT compiler in a multi-
threaded environment?

~~~
jerf
Like (somewhat ironically) Javascript, Java's origins can be very easy to
forget, and also like Javascript, both are surprisingly humble. Javascript was
banged together in a week, and Java was intended to be a language for
developing constrained set-top box UIs in the 1990s. While that does not
render them intrinsically unsuitable after decades of development for the sort
of tasks that they are put to today, the early days of a language still
inevitably color it forever when backwards compatibility is a big concern.

~~~
sangnoir
To distill your contribution to its essence:

 _Parent: what 's wrong with the JVM as a multithreaded-JIT?

You: Java is similar to Javascript. Javascript was terribly designed in the
90's. Therefore Java sucks . QED._

You should try your hand at politics! Don't answer the question (Java),
instead, segue into the more comfortable topic/what _you_ want to talk about
(Javascript)

~~~
Manishearth
Um, jerf gave a reason why Java's history affects it here and noted how it's
similar to Javascript in that regard on the side; not the other way around.
The JS comparison was not part of their argument.

~~~
sangnoir
But that has _nothing_ to do with the question he is responding to; regarding
Java's suitability as a multithreaded JIT

------
gizmo
I've heard these arguments many times before, about how in theory GC'd
languages can have less memory allocation/deallocation overhead. This
theoretical benefit never materializes, though. This is because in a language
with garbage collection the programmer barely thinks about memory at all which
results in millions of tiny allocations happening all over the place and that
wrecks performance. In contrast to languages with manual memory management
where programmers use strategies that minimize the headache (memory leaks and
doubly-freed data) of manual memory management which results in there being
vastly fewer memory operations.

Where in a dynamic language people are tempted to use write things like
`people.pick('name').capitalize().sort()`, where containers of objects are
created at every stage of the pipeline just to be discarded moments later. In
C you wouldn't write code like that. Or if you do, you simply allocate a
single block of memory beforehand that's large enough for the entire operation
and you reuse that memory block at every stage of the pipeline. In C more
memory efficient code is less work to write and easier to reason about.

So it really shouldn't be a surprise that in practice GC'd languages like Java
are dogs in terms of performance, nor should we expect this to change no
matter how clever the garbage collectors get. The "sufficiently intelligent
garbage collector" is a creature just as mystical as the "sufficiently
intelligent compiler".

~~~
xenadu02
This is nonsense. Games using Unity think about it all the time and that's
using C# and a GC runtime. (Here's a dirty little secret: C++ games use object
pools too and for the same reason! You can't afford to take the malloc/free
overhead in the middle of rendering a frame, any more than you can afford a GC
pause!)

Again looking at C# (or Swift's Lazy collections, or Java streams) your
example isn't quite correct. You can absolutely get single-pass lazy
evaluation of the elements, or at the very least avoid any temporary
intermediates.

The .NET runtime has various GCs, some of which are concurrent GCs that can do
thread-local pauses without pausing the entire process, as well as doing
extremely short pauses (a few ms) but otherwise running concurrently with
other threads doing allocations. The GC even does compaction, avoiding the
memory fragmentation problem that has driven many a C programmer crazy. (There
are other approaches like V8's JS GC.)

The idea that GC has to be a dog is nonsense. It has the benefit that
allocation is just adding sizeof(object) to a pointer, making allocation
trivial. It has the runtime overhead of occasionally needing to pause things
for memory safety reasons, depending on the specific implementation. It also
has a higher high-water mark for memory usage. It has the human overhead of
reasoning about living roots and avoiding unnecessary object churn (though
that is true of any system).

Reference counting has runtime overhead of incrementing/decrementing reference
counts, and human overhead of manually breaking retain cycles and remembering
to properly increment/decrement the counts (unless using things like ARC which
insert the calls automatically).

malloc/free has runtime overhead to manage slabs and free lists, plus
fragmentation problems under some workloads. It has a massive human overhead
of making the programmer remember to track and free the allocated memory
perfectly or risk leaking or corrupting memory, potentially leading to remote
code execution in the worst case.

These are all tradeoffs with costs and benefits.

~~~
kabdib
My experience with XNA (a C# based gaming environment) was that allocating
memory and incurring GC was death. So you bent heaven and earth to avoid
making garbage.

I haven't used Unity. It's probably better. (XNA was essentially abandonware a
couple years after Microsoft hatched it -- a typical myopic lifecycle and a
good reason why you can't trust MS with your platforms unless they have a
_really_ strong money story behind them).

------
dmytroi
Currently I work as engineer in video games, and I would like to say that next
step is just to write software completely without malloc+free and GC : you
just allocate virtual pages directly from OS, create your own pool allocators
in them, and that it. When you need to "free" stuff you just wipe whole
virtual page, O(1) free, faster then anything you could imagine. We have only
three special cases : string operations (usually implemented with special
string pool), heap (malloc + free) for libraries that cannot live without it,
and everything else on stack.

That said, I honestly don't understand why everyone only talks about GC and
malloc+free abstractions, because they simply do not materialize in physical
world at all (even if malloc needs a new page, it will call kernel for it),
cpu/kernel do not have such mechanisms, so this is only abstract limitation in
developers heads. So why not just talk (if we actually end-up in situation
when we need to talk about memory, 'cause usually it's just work) and measure
about what's actually real?

~~~
vvanders
Video games end up being a special case where you tend to have somewhat
discrete checkpoints and can dump large parts of RAM via pages.

Still pooling is something that everyone should be taught and seems like very
few are.

~~~
viraptor
I'm not sure it's that special. At least the arena allocator approach. It's a
great solution servers too. If the connections are short-lived, you can just
give a few pages to each one and unmap them at disconnect.

------
scott_s
> An array in C++ has none of these things. However, consider that if the
> array is ever to be freed by free, then its size must be recorded anyway.
> That leaves 8 bytes of overhead, which is likely comparable to the overhead
> caused by malloc's fragmentation.

Actually, no, it does not need to explicitly record the size of the
allocation. Some implementations of malloc do this, using what are called
"headers". But more modern allocators tend to use techniques which avoid this.
They basically follow the pattern:

1\. Maintain a list of slabs, where each slab contains objects of a particular
size (say, 4, 8, 16, 32, etc, but not necessarily always doubling by 2).

2\. You ensure that the slabs are in contiguous regions of virtual memory. The
start is always on a page boundary.

3\. For each page, you record its meta-information in either a giant array
(for small address spaces), or a radix tree (for large address spaces).

4\. Upon a free, you can just figure out what page an object lies in, then
look up where its meta-information is. Then you have all of the information
you need to free an object of a particular size. This function from an old
project of mine shows all three techniques (headers, radix tree, giant array,
also called a BIg Bag Of Pages):
[https://github.com/scotts/streamflow/blob/master/streamflow....](https://github.com/scotts/streamflow/blob/master/streamflow.c#L1386)

Something else the author did not mention is that when it comes to performance
tuning a GCed application, you have another variable: heap size. If the heap
size is too small, you will GC too often. If the heap size is too large, your
pauses when you do GC may be too large. It may require tuning to find that
sweet spot.

Also, this came up in a previous thread
([https://news.ycombinator.com/item?id=11058911](https://news.ycombinator.com/item?id=11058911)),
but the routines the author outlined for object allocation are quite similar
to the ones used in modern implementations of malloc.

~~~
Manishearth
(this only works if destructors/finalizers don't exist). You still need
runtime type information if you want to be able to properly clean up a thing.
E.g. what would happen if I put a `std::vector` on the GC heap?

~~~
scott_s
It works just fine with destructors and finalizers, as this is the underlying
implementation of explicit memory allocation. One can build a true type-aware
object system on top of it. This is how C++ does it.

Please note that I am talking about techniques for explicit memory management,
not garbage collection.

~~~
Manishearth
Oh, you were talking about a non-GC situation. Right.

------
physguy1123
Something this doesn't mention is that for a given program written in both
Java and say C++, the C++ version will have to perform much fewer allocations
than the Java one. Escape analysis can help with this but afaik it's not even
a full replacement for stack allocation, not to mention the other scenarios
where C++ doesn't allocate.

On the other side, this also doesn't bring up that any sort of concurrent data
structure is much easier to write when using a GC, and the GC removes the need
for expensive memory tracking. It doesn't necessarily make things faster, and
libraries like Crossbeam and epoch management in general make things easier,
but Crossbeam isn't fully featured, is Rust only, and still suffers from
strange bugs.

A worthwhile side note is that when manually managing memory, non-malloc
allocators can often be used if really needed that have far higher performance
than both malloc/free and a GC.

~~~
qznc
Instead of Java, a comparison should use a language with value types
(structs): C#, D, etc

~~~
sixbrx
I like the C#/F# have value types, but they have the downside of not being
able to do away with the default constructor for the type, meaning it's harder
and less efficient to control type invariants (would have to guard validity in
methods repeatedly instead of being able to gaurantee valid construction). For
types for which a default/"0" value is a legal value they are good of course.

~~~
qznc
I don't understand. What is the problem with default constructors?

~~~
sixbrx
The problem with the default constructors is just that for value types, you
can't not have one. For plenty of kinds of things, there's no sensible default
value that would make any sense, but your type must admit such a value anyway,
there is no choice.

Which means when you have a value of the type, you can't assume it's in a
valid state. You have to put checking code somewhere, either in each method of
the type, or in functions that receive the type as a parameter, in clients
that create objects of the type, etc.

------
albinofrenchy
This really is an apples to oranges comparison.

In C++, you have to know the difference between stack allocations and heap
allocations.

If you aren't dynamically allocating anything; malloc and free performance has
no bearing. As far as I know, Java has no reliable way to create an object
purely on the stack (and it probably shouldn't.

And even if you are dynamically allocating a new object in C++, if it has non-
pointer children, it allocates that memory too in the same allocation call.

It is true that a lot of allocations will slow down C++ a ton; but a lot of
allocations can be avoided in C++ whereas in Java you don't get that fine of
control.

Even then; there is nothing about the malloc+free contract that prohibits you
from having an implementation that does some of the things the author talks
about. You can make a version of free that marks memory locations to be freed
at the nth call of free.

I'm curious on the source for this statement:

> Java's collector can free an unlimited number of objects in time independent
> of the number of objects freed.

That seems like it can't possibly be true in the generic case.

~~~
haimez
It's true, because the time to free objects is not related to the number of
objects freed- it IS dependent on the number of objects that AREN'T freed. The
GC time (in a very general sense) is the amount of time it takes to traverse
the live object graph.

~~~
albinofrenchy
Isn't that true only for objects in the 'new' pool? Or does it actually
compact the old pool every gc?

~~~
haimez
I'm not sure I see the connection you're trying to make. The specifics of old
pool vs new pool (generation is the preferred term) are dependent on the GC
config, but in no case can you find memory that can be freed by traversing a
"dead objects" graph. Liveness is what drives the relationship between size
and computation time. Big O notation stuff.

~~~
albinofrenchy
So my understanding of the new generation is that it's essentially a giant
chunk of continuous memory and as you create new objects, you are just adding
to the end of it. For whatever reason a GC event happens, and now you can
traverse the live graph, and anything alive in the new generation gets copied
to the old generation partition of memory. And now you just reset the counter
for the new object pool, and voila, you've deleted a whole bunch of young
objects just by not copying them.

But once it's in the old generation, you run right into the same issue with
fragmentation and having to keep track of where deleted objects are so you can
reuse that space. If the idea is that you just keep doing memcopies for all
the live objects to reuse the space and avoid looking at dead objects, I'd
think that that is it's own performance issue?

------
efaref
If anything, Rust shows that you can have reliable "manual" memory management
without the need for a GC. The best of both worlds.

While I don't buy the GC performance argument, it's ok to use it, because most
of the time it doesn't matter. Sure it can be "almost as good as" manually
managing memory, but that's about as hard to achieve as manually managing your
memory. The point of GC is that it relieves the programmer of caring about
managing memory, and as such it's only really useful where the system has
sufficient resources that the fact that a GC is running doesn't matter. For
the vast majority of workloads on modern hardware this is true.

At the other end of the scale, in real high-performance embedded systems, even
malloc+free is too slow, and memory fragmentation is too much of a problem for
things with expected uptimes measured in years. GC is a complete non-starter
here. Instead these systems use pools of fixed-size buffers than can be O(1)
allocated and freed, and never fragment. You can use MRU allocation of buffers
from these pools to keep the dcache warm, and you can make your memory-using
operations queue until they can reserve the memory they're going to use up
front so that your system can run flat out using all the memory it can without
ever crashing from OOM. Both malloc+free and GC (especially GC) can only dream
of those characteristics.

------
viraptor
I feel like this article goes in the typical trap - you either do fully manual
malloc/free, or stop-the-world mark&sweep. But that's like extremes on the
spectrum. What about arena allocators (in dynamic content for example per-
scene and per-frame). What about selective refcounting. What about mixing
systems in various heaps. What about lightweight processes with own, well
adjusted local heap size. There are so many options available in real world.
GC is not Java, and Java also works against a good GC with the lack of value
types.

~~~
PDoyle
I'm the author of the article, and I agree 100% with you: there are definitely
application-specific memory management schemes that beat both malloc and
automatic GC. In fact, I'm working on one right now: it's a way of storing a
huge balanced tree data structure in arrays to save space.

------
joeld42
This whole post is nonsense, but there's one point in particular I think that
GC advocates don't understand about managing memory. He writes, "Consider: you
need to hand out chunks of memory of any size, one at a time, with no advance
warning of usage patterns; and then you need to be able to free arbitrary
chunks at arbitrary times."

This is entirely wrong. Sure this is true if you're writing a general purpose
programming language, but if you're writing an app or a game or a tool, you
know the usage patterns because you are building them. You don't need to free
things at "arbitrary" times, you free chunks at times when it is appropriate
for your problem. You know more about your memory allocation patterns than any
general purpose GC could ever know, and so you can write code to fit those
better.

In a GC language, often you don't care about where and when the memory comes
from, but sometimes you do, and when you do, you're stuck.

~~~
qznc
You can put a manual allocator on top of a runtime GC, just like you put it on
top of malloc.

It is Java where you are stuck. Not GC languages in general.

~~~
efaref
Manual deallocators are more interesting.

PJSIP[1] has an interesting approach to this. SIP messages have lots of
strings associated with them, are usually relatively short-lived, and
typically you'll want to modify these strings or attach new ones. PJSIP does
this by hanging a string arena off of each SIP message object.

As you add strings to the SIP message, these get appended to the arena, which
is grown automatically as and when it needs to be. If you delete a string, it
just sticks around in the arena unused. Arena space can also be shared, e.g.
if you copy a substring from one SIP header to another, it'll just be referred
to twice using a pointer and length.

Finally, when you free the SIP message, it just frees the entire arena in one
go.

[1] [http://www.pjsip.org/](http://www.pjsip.org/)

~~~
PDoyle
Yes, that kind of "transaction-based" memory management is beautiful in
applications where it's appropriate: just leak memory until you're done with
the current task, and free it in one chunk. That's hard to beat for
performance and simplicity.

------
adrusi
One point that seems to be ignored in this discussion is that having a GC
creates problems for interfacing with foreign libraries. It's not too hard to
create hooks in the runtime to allow calling C functions, but it's very hard
to go the other way, and usually you have to write bridge code in C that links
to both the language runtime and the foreign library. And no matter what there
will be function call overhead that's unacceptable in some circumstances.

Languages runtimes enable lots of other cool things, like JIT compilation and
runtime specialization, user-space schedulers and I'm sure some other things
as well, but if, as a language designer, all you want is GC, consider if
that's worth all the costs of having a runtime.

------
im_down_w_otp
This article seems like a bit of a strawman.

I don't know of anybody who complains that the cost of actually freeing memory
in GC'd languages is somehow significantly worse than in non-GC'd languages.
Everybody is ultimately complaining about how expensive it is to figure out
what is actually okay to free/release in the first place.

There are plenty of GC-backed languages/runtimes where the GC penalty is
isolated or mitigated because of other known constraints of the system.
Erlang's no-shared-mutable-state constraints allow its GC system to avoid
stopping the world and also allows processes to run on other schedulers while
GC is active on a different process running on a different scheduler.

------
norswap
One flaw in this article is that it assumes "naive malloc/free use". It
assumes that a program written in C and in Java (for instance) will need to
perform the same amount of allocations.

This is far from the truth. First, C has nested structs, whereas in Java all
object references are pointers. This leads to many more allocations in Java.
Second, C programmers are aware of the cost of malloc, and will try to
minimize its use. Reusing memory whenever possible is common, as are some
simple memory management strategy (such as arena allocation: allocate big
chunks of memory, do all your work therein, then free it in one go, for
instance at the one of a processing step).

~~~
lmm
> Second, C programmers are aware of the cost of malloc, and will try to
> minimize its use. Reusing memory whenever possible is common, as are some
> simple memory management strategy (such as arena allocation: allocate big
> chunks of memory, do all your work therein, then free it in one go, for
> instance at the one of a processing step).

But if you're willing to put the effort in to do something like that, there's
no reason you can't do the same thing in Java.

~~~
norswap
Ah but you can't. The GC will still pause and trace no matter what.

As far as I know, you can only control allocation at the level of
"retainment": you can keep something from being GC'ed by keeping a reference
to it. Potentially you can also reuse memory. So basically, object pools. This
is much less efficient than arenas, where there is one malloc and one free (or
one per memory page).

Also reusing memory is a bit ugly: you can't specify that things are final
anymore, which you can in C -- sure this is no hard guarantee (it isn't in
Java either) but it's a useful indication that during the lifetime of a
"logical object" (not its memory) it will not change.

I've tried using object pools a couple time in Java, whenever allocations
dominated the running time, it never helped any.

~~~
lmm
> Ah but you can't. The GC will still pause and trace no matter what.

The tracing can happen concurrently with the appropriate config. And I
wouldn't be surprised if you could disable GC, at least for all practical
purposes (e.g. only GC at 100% space usage or every 10000 years, or some
such).

> As far as I know, you can only control allocation at the level of
> "retainment": you can keep something from being GC'ed by keeping a reference
> to it. Potentially you can also reuse memory. So basically, object pools.
> This is much less efficient than arenas, where there is one malloc and one
> free (or one per memory page).

AIUI you can do that by defining objects backed by an unsafe buffer
(technically a non-public API, but there's an understanding that it will be
supported and the wheels are turning towards standardizing either it or
something equivalent that supports the important use cases).

> Also reusing memory is a bit ugly: you can't specify that things are final
> anymore, which you can in C -- sure this is no hard guarantee (it isn't in
> Java either) but it's a useful indication that during the lifetime of a
> "logical object" (not its memory) it will not change.

I didn't think C supported final at all? I agree that it's less convenient
than it might be and a lot less convenient than standard Java, but I'm not
convinced it's really any worse than what you'd be doing in C.

------
arcticbull
I think all this time spent over the last 50 years trying to improve GC were a
kind of sunk-cost fallacy. Instead of answering the question "how do we make
GC better/faster/less laggy/take less memory/not freeze the world/be more
deterministic" people should have been investing in "how do we elide object
lifetimes so programmers don't have to think about them."

Especially since GC doesn't map to the real world in important ways, for
instance OS resources such as sockets/threads/pipes/files. You have to build
your own reference counting on top of these things anyways.

This is why I have a lot of hope in Rust -- not necessarily because I see it
as the ideal, rather because they are asking the latter question and they've
got some pretty cool results.

It's time to let go of 'GC' in the classic sense and figure out how we can
elide object lifetimes in a clean, rational and deterministic way.

~~~
dboreham
I wish I could upvote this 10 times. In C the programmer has to manage memory
which is a chore, but the programmer can do it because they _know_ something
about the lifetime of their data. In GC languages that knowledge is sent to
/dev/null, then heroic efforts are expended to attempt to machine-re-lean it
again at runtime. This is crazy. As you observe, the better plan is to relieve
the programmer of the chore of manual management, but do while retaining their
knowledge and understanding of the behavior of their code.

------
chris_wot
I'm curious about the following:

 _Here 's how Java allocates an object of size N: round N up to a multiple of
8 bytes, and bump the allocation pointer by that much. That's it. By using a
scheme called a thread-local heap (TLH), there is no contention between
threads. There's some additional handling for large objects, and for the case
when the TLH is exhausted and another one must be allocated, but generally
speaking, the GC is self-tuning so that none of these activities appear on the
hot path of an application. The overhead is generally negligible, and all
you're left with is the pointer bump._

What is to stop someone from implementing something like this in a C library?
It sounds like it could be relatively easily done... or is it the TLH that
causes issues?

~~~
Manishearth
You are correct. As I mentioned in
[https://news.ycombinator.com/item?id=11426339](https://news.ycombinator.com/item?id=11426339),
jemalloc does this
([http://linux.die.net/man/3/jemalloc](http://linux.die.net/man/3/jemalloc),
see the implementation notes), and you can use it pervasively so that all your
existing malloc-using code links to jemalloc instead of the regular malloc.
Ultimately the allocation strategy is an orthogonal concern to garbage
collection, it's just that many GCs spend a lot more time worrying about the
allocation strategy since they need to.

It actually goes one step further than the TLH and does this round-robin stuff
sharing the load between your processes.

------
Jasper_
My issue is that with GC-able languages, people don't seem to care as much
with allocating lots of data and expecting the GC to wipe it all away.

To give a recent example, HTML5's canvas API only allows accessing image data
with getImageData, which will allocate a large array for you, and then return
it. You cannot do this at 60fps, because of GC pauses (and also because Chrome
doesn't actually track the memory usage of the ImageData correctly, leading it
to run out of memory rather than correctly free the object).

There is no way to pass in a pre-allocated buffer, because the presumed cost
of the GC is free. When developers are asked to track memory usage themselves,
they're allowed to be a lot more careful with it, and design APIs that take
that cost into account.

------
Findeton
If you need to do some low latency and high throughput stuff, sometimes you
just can't afford a garbage collector. Like in many videogames.

------
chipsy
I will allow that GC can be fast, but you have to work hard and have
appropriate language features to keep it "chunky". If you have millions of
small traces to follow, the n of your O(n) is going to be big. In contrast a
manual collection algorithm can rely on user supplied metadata to free the
whole dataset in one instant without having a tracing step.

In apps that need to scale, designing data to be efficiently allocated and
freed becomes crucial and the reference by default strategy really falls down.

------
chetanahuja
It's been what, 20+ years we've been hearing the theoretical arguments about
how automatic GC can be soooo much better than manual or ref-counted heaps and
yet...
[https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc...](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc_tuning.html)

~~~
dboreham
I like to joke that you could simply comment out the body of free() in any
C/C++ program and achieve 99% of the benefits of GC.

------
crudbug
One common theme I have seen is all the Big Data Java solutions using home-
baked solutions for off-heap storage. This essentially is free-malloc for Java
world using sun.misc.Unsafe API.

The problem is Java specs do not dictate native access besides JNI. One
standard off-heap API, which provides all the goodies of native world and
escapes GC will be a right step. It is upto the dev to choose which world the
object lives.

------
Manishearth
This article seems pretty muddled.

First, one reason GCd languages are slow is because they GC more than
necessary; not necessarily because of the GC itself. It's not always possible
to statically determine what needs to be GCd. Go is pretty good at this,
though, especially since pointers are explicit unlike Java. You also have
cache issues here. Even if the GC tries to assuage these cache issues by
smartly placing objects, this is a far cry from the speed you get from a
cached stack.

Secondly, the "GC can free when it wants" is not an indictment of malloc-based
heap usage. You can easily plug in an allocator that does the same thing with
respect to freeing memory at an opportune moment. IIRC jemalloc does this. You
can argue that jemalloc isn't malloc, but since your programming style or
language doesn't need to change if you want to use jemalloc over malloc, it
doesn't matter -- comparing GC to malloc isn't the real question at hand here,
comparing GC to a malloc-like memory management API (or an abstraction over
it) is.

Almost all the improvements that a GC has over malloc-like memory management
mentioned in the article can be implemented as a policy in a jemalloc-like
pluggable allocator. Smartly managing malloc() calls and delaying free() calls
is orthogonal to garbage collection, it's just that garbage collectors
invariably do this since it's necessary to offset the performance hit of
putting everything on the heap.

The only legitimate performance feature mentioned there is that a moving GC
can handle the cache smartly. It is true that malloc can't do this. However,
malloc doesn't have cache problems in the first place -- managing the cache in
a GC is making the best out of a sticky situation; malloc doesn't cause that
situation in the first place.

------
skybrian
This all seems language and implementation specific.

Comparing Go and Java, I have the general impression that garbage collection
in Go has reached the point where it just works, while in Java we're still
tuning it after all these years.

But part of the reason might be that creating garbage is more easily avoided
in Go. Or it might that people are attempting different things.

~~~
adrusi
I've been hearing (from other HN comments) that Go 1.6's GC is still
considered less performant that hotspot's in the average case, and has
significantly worse performance in some select edge cases.

Historically Java competed with C++, and its evolution has been defined by
trying to get it to match C++'s performance in the general case. Go prefers
simple, sound and obvious to maximum performance.

~~~
vvanders
Java is never going to compete with C++ until they have proper memory
placement semantics.

If I know how to arrange my data I can beat Java by 50x.

The thing is so few people care(or need) this these days that stuff like the
original article gets written.

------
vmorgulis
What I'd like is a language that permits me to trap a dynamic allocations with
a knowledge of the callsite.

Instead of having specialized objects for manipulating memory (like deque<>,
vector<> ...), this language would provide statically a profile (or signature)
of the stack at compiletime. It would allow the user to apply strategies that
are used in JITs (like inline caching).

For example, detect that an array is push or shift only:

\- if it's push-only, preallocate memory by chunk (like vector.reserve()).

\- if it's shift-only, trap the [] operator and change the head of the array
by an offset (no resize)

The issues of GC and malloc are around the context.

------
jstimpfle
I've had to use Java for sets and maps of millions of small objects
(containing 1-3 integers). The garbage collector choked pretty early after a
few (< 20) millions of them (I think there is no way to hash integers in
standard Java without significant object overhead).

Knowing almost nothing about the JVM, what's the best approach to go about
this? I considered implementing my own sets / maps on naked int[] / structs of
1-3 naked int[].

~~~
haimez
Objects are expensive. If you're dealing with millions of them you definitely
want to 1.) use primitives where possible (ie: int, not Integer) and 2.) put
those primitives in native arrays instead of collections (ie: int[] and not
List<Integer>). It sounds like in your use case that means going from the
collection of objects pattern to the object of collections pattern.

------
Ericson2314
Someday, when Rust gets its GC interface, we can do some proper benchmarks.

~~~
Manishearth
It's still apples to oranges, because Rust will only ever have an fine-grained
optional GC (you choose what objects s to pop into the GC), not a pervasive
one.

~~~
Ericson2314
Well the second step is a make an allocator crate that puts GC in as the
system allocator for std.

~~~
Manishearth
As far as Java is concerned, _still_ apples-to-oranges since Rust doesn't
pervasively mallocate either.

Go would be an ok comparison. I suspect Go would come out on top, since it
actually tries to avoid GCing things, whereas the Rust compiler wouldn't even
be aware that there is a GC going on and wouldn't do any analysis.

~~~
Ericson2314
I mean compare the same rust programs, with jemalloc vs with GC.

~~~
Manishearth
Ah, okay. That can work.

