Hacker News new | past | comments | ask | show | jobs | submit login

These GCs have a 1GB overhead when working with a 16GB heap. That’s 6% of your available memory. I was expecting modern GCs to be way more efficient so I’m kinda shocked this is what state of the art is.

I also wonder how much of the progress to “Sub (200) millisecond” latency target is due us just having faster machines. I honestly have no model to tie this to actual performance of my code. I guess it translates but not really sure how.

Not bagging on Java —— I am just surprised how inefficient an industrial strength GC can be. I understand why manual memory management still holds its own now.

> These GCs have a 1GB overhead when working with a 16GB heap. That’s 6% of your available memory.

6% memory overhead while preserving throughput is actually quite excellent. Just a little more than the average fragmentation overhead under manual memory management.

Manual memory management can "hold its own" because it can tailor the allocation/release profile to the problem and aggregate some of those overheads.

> due us just having faster machines

?? These benchmarks keep the machine constant. And I’m not sure we’ve seen faster machines in a long time. Clock speeds have remained fairly constant and gains are solely in more cores.

> Clock speeds have remained fairly constant and gains are solely in more cores.

Instructions per clock have risen steadily, even though the frequency stays the same. About doubled since 2011 according to cinebench[0].

That being said, I agree the faster machines argument doesn't hold much water.

[0]: https://cpugrade.com/a/i/articles/cbr15-ipc-comparison.png


Sub-millisecond is already here with ZGC, and it's not much to do with faster machines (machines haven't become that much faster over the past ten years) but with more sophisticated algorithms.

Seeing that RAM is the cheapest resource to scale (and that cleverly using it saves energy, given that Java uses the least energy out of managed languages) it seems to be a very good tradeoff.

Accessing RAM is very slow and CPU caches don't scale, though. So it's not as simple as you're presenting it

Well, for general computing I don’t think we have an answer either way. Yeah, for some hot loop with an array we get insane performance, but that is a very specific workload and not applicable everywhere. What about a chat application, which parts of the memory should be physically close to each other?

Also, Java’s GCs are compacting, putting similarly old objects relatively close to each other. The same random chat program in C might use a linked list with worse characteristics so it is really not that obvious to me what would be a good solution.

It's worth noting glibc isn't zero-overhead, either.

Manual memory management can also have overhead due to e.g. fragmentation. I don't know if it approaches 6% in realistic scenarios.

Wait, fragmentation is not included in that 6%. It is that a GC with no fragmentation will already suffer a 6% overhead. A region-based GC will suffer from additional temporary fragmentation on top of those 6% as well, because some segments can be filled only partially with live objects until they are compacted. That effect might be actually bigger because allocations are done only from contiguous space, and it can't just try to allocate from the "holes" until it compacts them. And you also need some additional room to do allocations from in order to avoid too frequent GCs. So I'm practice it is not 6% but sometimes 600%.

Yes, sure. But you do still need to compare the total overhead associated with manual memory management (which is not zero) to the total overhead associated with GC. It's an empirical question which is larger in any given case. And of course it depends on the implementation of malloc and the implementation of GC.

I'm not expressing a view as to whether the overheads of manual memory management are typically comparable or not. I really don't know.

The total memory overhead of manual memory management can be virtually zero if you're careful enough to avoid fragmentation (and you can because it is manual, so you control a lot of details). There is for example no constant header for each allocated chunk. In Java you pay additional 16B for each allocated object and you can't get away from that overhead.

Yes, manual memory management can always be better with arbitrary amounts of tuning. However, I think the more interesting question is how the overheads compare in typical applications written in a reasonably straightforward and maintainable style. In practice, fragmentation can be a difficult problem to avoid when using manual memory management. For example, Firefox struggled with it for a long time.

A web browser is a huge and complex app. The argument it struggled with memory issues is moot if you can't show a comparably featured app written in Java to compare memory use.

Anyways, I've got plenty of anecdotal evidence where Java apps take order of magnitude more memory than their close counterparts written in languages that use manual memory management. Not browsers, but things like benchmarking tools, webservers or even duplicate file finders (shameless plug: https://github.com/pkolaczk/fclones#benchmarks - there is one Java app there, see its memory use :D)

We’re talking at cross purposes here. I did say that I wasn’t expressing a view as to whether the overheads of manual memory management are typically larger than the overheads of GC. I don't know if they are or not.

The point I was making was just that you do need to compare empirically the typical overheads of each to make a meaningful comparison. The 6% figure in isolation doesn’t tell us very much.

As you point out, it is difficult to make these comparisons on the basis of anything other than anecdotal evidence, since it is rare for applications of significant size or complexity to be implemented in multiple languages.

According to the cppcon talk about Mesh [1] (an allocator that implement compaction for C++ programs), the overhead can be massive too (17% overhead measured on firefox, 50% on redis!)

[1] https://youtu.be/XRAP3lBivYM?t=1374

Nice link. I guess theoretically you can always optimise this in languages with manual management. With complex GC you have to figure out a way to tame the beast and I’m not sure if it’s easier to reason about

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact