
The Four Month Bug: JVM statistics cause garbage collection pauses - osi
http://www.evanjones.ca/jvm-mmap-pause.html
======
caf
_I don 't know exactly why the Linux kernel does this, but the pauses do not
seem to occur for reads, so the Linux kernel is marking these pages read-only.
A friend suggested this is the kernel's way of reducing the write I/O rate
under overloaded conditions. If you know exactly why the kernel is doing this,
I would love to hear it._

What is happening is that the JVM is dirtying a previously clean page (this
also happens in your test program, because the dirty pages in your mmap'ed
file are being regularly written out - and therefore made clean - by
background writeback).

If, at this point, the global dirty limits are exceeded
(/proc/sys/vm/dirty_bytes and /proc/sys/vm/dirty_ratio) then the task will be
paused to throttle the generation of dirty pages.

------
jcalvinowens
This is by design, these are called "stable pages".

In some circumstances, the kernel has to ensure that pages aren't modified
between initiating and completing the writeback for that specific page.

Btrfs is a copy-on-write filesystem, so it ends up needing to use this
guarantee more often than the others. This is something the btrfs developers
are actively working on improving.

Here's a good article describing this in a bit more detail:
[https://lwn.net/Articles/442355/](https://lwn.net/Articles/442355/)

EDIT: Fix a typo

------
dicroce
BTW, I'm pretty sure that this blocking behavior is the best choice (among bad
choices). If I memory map a file, I can set bits in that buffer at a much
higher rate than the kernel can write those bits to disk... So the choices
are: block the writing thread until the kernel can catch up, or simply drop
some of those writes... At least with blocking my writes, I have a chance to
notice the issue (in my application, this would cause a chain reaction of
threads to backup and ultimately block resulting in me dropping network
packets)...

~~~
Nelson69
The writes block? How can that be?

The only guarantee when you write to an mmaped page is that your wrote to the
memory, whether or not it makes it do disk is up to many different things. So
before you can write to the memory, it needs to have the right contents, that
can mean a read has to finish, it can also mean pages have to get murdered to
free up for you to have the memory to read in to. I can't think of how the
write itself can actually block unless a read is required which hasn't
finished (like the file is in read/write mode or something) in fact, other
than a pagefault, there is no way it can be a blocking operation, the pages
are stitched in to the processes page tables. At least I can't think of how it
can block on a write right now, I've had a couple glasses of wine with dinner
though. [edit] m_time update makes some sense, that blocks though?

In write only mode there are optimizations to not require the read.

~~~
toast0
> I can't think of how the write itself can actually block unless a read is
> required which hasn't finished (like the file is in read/write mode or
> something) in fact, other than a pagefault, there is no way it can be a
> blocking operation, the pages are stitched in to the processes page tables.

You've almost got it. The mmaped pages may be in the process page table, but
they may be in the page table as read-only: if the process tries to write to
the page, the process traps into the kernel. If there are few dirty pages, the
kernel will mark the page dirty, make it writable from the process and make
the process runnable again. Apparently, if there are a lot of dirty pages, the
kernel will not fill the request immediately, it will wait. While it's waiting
the process is not runnable (other processes with the same memory space would
continue to be runnable)

~~~
TheCondor
So a page fault

------
616c
You know, it is pretty funny that some people on Reddit linked to a talk from
aaronsw (AKA tenderlove) about Ruby on Rails performance issues and
performance regressions. In the middle of the video he cracks up and points
out he went down a tangent path because that Ruby profile was built of a gem
that called into Ruby MRI's C API. So he worked his way through the gem
developer, and then the Ruby dev who wrote the C API. Neither knew what was
going on.

[https://www.youtube.com/watch?feature=player_detailpage&v=JM...](https://www.youtube.com/watch?feature=player_detailpage&v=JMGmaRZtgM8#t=1450)

Turns on the profiling data where there was a discrepancy between CPU times
and Wall times was only OS X because of a problem with a trap() call on OS X
specifically, not any other platform. His moral of the story: even profilers
have bugs.

[https://www.youtube.com/watch?feature=player_detailpage&v=JM...](https://www.youtube.com/watch?feature=player_detailpage&v=JMGmaRZtgM8#t=1608)

I think I am seeing a pattern today.

~~~
0x0
ps I think you got some aarons mixed up

~~~
616c
Yes, I did get the Aarons mixed up. Thanks to both of you for noticing.

------
rdtsc
Wonder if it deals with dirty write back. That is a Linux behavior for writing
dirty pages to disk. If writes come in at a high enough rate, Linux will hard
block the writing thread until pages are written to disk.

Before that usually it spawns a bunch of pdflush processes to flush data out
in the background, but if those can't keep up then it moves to blocking the
process. On older systems and older spinning drives, blocks could take seconds
even.

See /proc/meminfo for these two entries:

    
    
       Dirty:                 4 kB
       Writeback:             0 kB
    

Dirty are the current dirty pages, and writeback is the current amount being
written out.

------
hinkley
Good catch.

This would make a good case study in Heisenbugs. GC delays that only happen
when you're collecting GC statistics.

~~~
mappu
I had a hand-assembled binary that was always crashing on out-of-bounds memory
access. But whenever i loaded it in the debugger, it was always perfectly
fine.

That was the day i learned about _NO_DEBUG_HEAP !

------
ooOOoo
I have not seen any link to a bug report at
[http://bugs.java.com/](http://bugs.java.com/)

What is reported?

------
jalcazar
Debuggers, profilers, monitoring tools, etc. have an associate overhead....
how is this a bug?

------
arielweisberg
If Hotspot didn't require global safepoints all the time this wouldn't be such
a big deal.

I have been wondering if Zing handles threads blocked in memory mapped files
better.

------
b0b0b0b
Would -XX:-UsePerfData also work?

I noticed jankiness in my eclipse and adding that flag seemed to help.

------
zengorilla
Direct reclaim shines again!

~~~
pron
This has to do with the JVM's monitoring, which, among many other things,
monitors memory allocation/deallocation, which just happens to be automatic.
It has nothing to do with automatic vs. manual memory management.

~~~
zengorilla
Direct reclaim is the precise name for what the linux kernel is doing in the
case called out by the article.

[http://lwn.net/Articles/396561/](http://lwn.net/Articles/396561/)

But thanks for the snark anyway!

~~~
pron
No snark, but I did completely misunderstand your comment[1]. In any case, I
apologize, and I learned something new!

[1] BTW, if you're mentioning a term that isn't widely known, it's helpful to
link to a definition.

~~~
zengorilla
Right on, my comment could have been much more clear, I will do so in the
future :).

------
cfontes
Talk about perseverance... Really is inspiring.

------
irascible
I blame oracle

~~~
mrmondo
While oracle is certainly to blame for a great deal of things in life, your
comment would benefit from some explanation / background regarding this
specific issue, otherwise it's value is debatable.

