JVM statistics cause garbage collection pauses (2015)

pjmlp · 2024-09-19T16:39:54 1726763994

For proper statistics use Visual VM or Flight Recorder, if using an OpenJDK derived JVM implementation.

Also note that not all JVMs are made alike, and there are plenty to chose from.

hashmash · 2024-09-19T17:17:29 1726766249

When using the `-XX:+PerfDisableSharedMem` workaround, VisualVM cannot attach to the running process anymore.

sltkr · 2024-09-19T16:02:42 1726761762

> The pauses occur even [..] if you call mlock

I wonder how this is even possible. The only scenario I can think of involves a page fault on the page table itself (i.e., the page is locked into memory, but a page fault occurs during virtual-to-physical address translation). Does anyone know the real reason?

survivedurcode · 2024-09-19T16:25:03 1726763103

Probably because pages mapped, even if they are locked into memory are not allowed to stay dirty forever. Does this help? https://stackoverflow.com/a/11024388 (In contrast, if you mlocked but never wrote to the pages, you probably would not encounter read pauses)

cogman10 · 2024-09-19T15:41:21 1726760481

> in /tmp

Why is `/tmp` on disk and not a tmpfs mount?

sltkr · 2024-09-19T16:21:25 1726762885

There is no law that says /tmp must be on tmpfs, and historically this wasn't done, because tmpfs is limited in size to some faction of the kernel's memory, while /tmp may be used to store much larger files.

For example, GNU sort can sort arbitrarily large input files, which is implemented by splitting the input into sorted chunks that are written to a temporary directory, /tmp by default. But this is based on the assumption that /tmp can store significantly larger files than fit in memory, otherwise the point is moot. So using tmpfs makes /tmp useless for this type of operation.

In the end, it's a trade-off between performance and disk space. I also prefer to mount /tmp on tmpfs for performance reasons, but you should not assume that this is the case on all systems.

aidenn0 · 2024-09-19T17:53:24 1726768404

While I run /tmp on disk, I should point out that tmpfs is not limited to the size of RAM; contents of tmpfs can be swapped out just like any other memory allocation.

funcDropShadow · 2024-09-20T06:34:22 1726814062

That is one of the reason why we should still have swap space.

aidenn0 · 2024-09-19T17:51:15 1726768275

Why would I want it on tmpfs? Only advantage I see is slightly improved boot times (/tmp is typically cleared on boot, which is obviously not necessary for tmpfs).

hinkley · 2024-09-19T20:04:07 1726776247

Slightly simpler handling for docker containers. Particularly if you run multiple copies of the same image on one box (blue-green deploys, process-per-cpu programming languages, etc)

TacticalCoder · 2024-09-19T23:55:37 1726790137

> Why would I want it on tmpfs?

It's now there in several distros by default. Not that it answers your question.

ta988 · 2024-09-19T15:14:56 1726758896

Is it still the case?

ackfoobar · 2024-09-19T16:34:58 1726763698

Probably yes.

https://bugs.openjdk.org/browse/JDK-8076103

Closed with "Won't Fix".

flykespice · 2024-09-19T17:44:02 1726767842

...With no reasoning at all?

ackfoobar · 2024-09-19T17:57:13 1726768633

A bit more context in the mailing list:

> It's a non-issue with a pure ram-based file system. Or tmpfs with no swap.

https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2015-...

lbalazscs · 2024-09-19T15:56:45 1726761405

In 2015 there was no ZGC. Today ZGC (an optional garbage collector optimized for latency) guarantees that there will be no GC pauses longer than a millisecond.

survivedurcode · 2024-09-19T16:28:40 1726763320

I would check your answer. These are pauses due to time spent writing to diagnostic outputs. These are not traditional collection pauses. This affects both jstat as well as writes of GC logs. (I.e. GC log writes will block the app just the same way)

pjmlp · 2024-09-19T16:38:37 1726763917

Which is why for anything serious one should be using Flight Recorder instead.

funcDropShadow · 2024-09-20T06:32:28 1726813948

Or /tmp should be a tmpfs as it is on most current Linux distributions.

esaym · 2024-09-19T19:58:02 1726775882

These modern garbage collectors are not simply free though. I got bored last year and went on a deep dive with GC params for Minecraft. For my needs I ended up with: -XX:+UseParallelGC -XX:MaxGCPauseMillis=300 -Xmx2G -Xms768M

When flying around in spectator mode, you'd see 3 to 4 processes using 100%. Changing to more modern collectors just added more load to the system. ZGC was the worst, with 16+ processes all using 100% cpu. With the ParallelGC, yes you'll get the occasional pause but at least my laptop is not burning hot fire.

plandis · 2024-09-19T20:30:58 1726777858

Yes no GC is free (well perhaps Epsilon comes close :)

It’s a low pause GC so latencies, particularly tail latencies, can be more predictable and bounded. The tradeoff you make is that it uses more CPU time and memory in order to operate.

mike_hearn · 2024-09-20T08:14:25 1726820065

Minecraft really needs generational ZGC (totally brand new) because Minecraft generates garbage at prodigious rates and non-generational GC collects less garbage per unit time.

namibj · 2024-09-19T20:08:07 1726776487

You'll need more spare heap for ZGC.

ackfoobar · 2024-09-19T21:21:03 1726780863

And using generational ZGC will probably lower CPU usage a lot.

tuna74 · 2024-09-21T07:41:19 1726904479

Yes, this is why GCs work so bad for 3D games since you are usually limited by memory bandwidth and latency, especially on systems with unified RAM (no seperate GPU RAM).

kanzenryu2 · 2024-09-19T19:52:34 1726775554

Sadly in many cases no; it's not magic. This nirvana is restricted to cases where there is CPU bandwidth available (e.g. some cores idle) and plenty of free RAM. When either CPU or RAM are less plentiful... hello pauses my old friend.

sunshowers · 2024-09-19T20:04:58 1726776298

This is why memory-bound services generally use languages without mandatory GC. Tail latency is a killer.

Rust's memory management does have some issues in practice (large synchronous drops) but they're relatively minor and easily addressed compared to mandatory GC.

foobarchu · 2024-09-20T04:00:24 1726804824

In cases where java is unavoidable and you're working with large blocks, it is possible to sort of skirt around the gc with certain kinds of large buffers that live outside the heap.

I've used these to great success when I had multiple long-lived gigabyte+ arrays. Without off-heap memory, these tended to really slow the gc down (to be fair, I didn't have top of the line gc algorithms because the openj9 jvm had been mandated)

pkolaczk · 2024-09-20T06:39:45 1726814385

Managing off heap memory in Java is pain even worse than manual memory management in C. Unlike C++ and Rust, Java offers no tools for manual memory management, and its idioms like frequent use of exceptions make writing such code extremely error prone.

foobarchu · 2024-09-29T19:37:06 1727638626

ByteBuffers and direct memory make it possible.

https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffe...

But it is a pain and only really useful if you have a big, long lived object. In my case it was loading massive arrays into memory for access by the API server frontend. They needed to be complete overwritten once an hour, and it turns out that allocating 40% of system memory then immediately releasing another 40% back to the GC at once is a good recipe for long pauses or high CPU use

hawk_ · 2024-09-19T16:52:16 1726764736

ZGC doesn't remove safepoint requests on threads which is the root cause. "Guarantees" here are with very heavy quotes.

funcDropShadow · 2024-09-20T06:36:42 1726814202

But it reduces the amount of safepoint requests by doing more in parallel to the working application.

hinkley · 2024-09-19T19:59:58 1726775998

The cost of statistics gathering on a GC implementation that avoids ineffective GC activity is less affected by the cost of telemetry (no news is good news), but it is still affected.

ahoka · 2024-09-20T05:50:08 1726811408

Not with Linux 5.x AFAIK.

jakewins · 2024-09-19T18:48:01 1726771681

Man I remember being bit by this in migrating to AWS - this had like snuck through on fast on-prem disks, but as soon as that /tmp was on RDS oh boy, it was a dozy.

opentokix · 2024-09-19T21:02:36 1726779756

Using ebpf, perf and flamegraphs would let him find this in a couple of hours. That was not available for him in 2015 tho.

hinkley · 2024-09-19T19:58:45 1726775925

Stuff like this is why back when I still wrote Java we only wanted to turn on JVM telemetry on production boxes if they were canaries. Slower you can work around by deploying more copies. But jitter is not something you can do much about.

smrtinsert · 2024-09-19T15:25:58 1726759558

Is this account a submission bot of some sort?

throwaway04324 · 2024-09-19T15:36:31 1726760191

The account seems to be connected to a real person, but it has a high number of submissions (over 350 submissions the past 30 days)

geodel · 2024-09-19T15:26:56 1726759616

Also spending a lot cause higher credit card bills.