
My favorite bug: segfaults in Java - luu
https://lukeshu.com/blog/java-segfault.html
======
DerekL
This isn't a bug in the compiler or the JVM. The Java spec explicitly allows
this optimization to happen. From The Java Language Specification (SE 7),
section 12.6.1:

> Optimizing transformations of a program can be designed that reduce the
> number of objects that are reachable to be less than those which would
> naively be considered reachable. For example, a Java compiler or code
> generator may choose to set a variable or parameter that will no longer be
> used to null to cause the storage for such an object to be potentially
> reclaimable sooner.

Also, the "this" keyword is treated like a parameter. (See the end of section
2.6.1 of the JVM spec.)

~~~
divs1210
so then why do you think the segfault is occurring?

~~~
ubercow13
Because the author of the code didn't take into account this part of the
language spec

------
amit-bansil
I've done a fair bit of work interfacing memory managed languages like Java &
Python with "unsafe" native code and found that this problem actually crops up
often.

The first issue is that finalizers should NEVER be relied upon to clean up
system resources the way that you might in a language with deterministic
garbage collection like C++. Instead, expect clients of interfaces that use
system resources to manually call some 'close' like method. Use finalizers to
warn clients that they are leaking. Josh Bloch has an excellent writeup on
this[†]:

[http://www.informit.com/articles/article.aspx?p=1216151&seqN...](http://www.informit.com/articles/article.aspx?p=1216151&seqNum=7)

The second thing, as I believe @btown is saying, is that the child object
seems to depend on memory that it doesn't own a strong reference to. I bet the
whole memory graph here is just a tangled mess. I used to think that that was
just the nature of graphs. When I first started working with GC that couldn't
handle these kind of cyclical structures without being explicitly told I
thought that it was an absurd premature optimization. Then, this wise old
hacker peered at my code, smacked me on the back of the head, and grumbled
that if I couldn't clearly explain the reference hierarchy it wasn't just the
software that would be befuddled, it was too complicated for his old brain as
well.

I'm sure that this can't be true, but I am starting to suspect that all the
tremendous work that has gone into fast pause-less concurrent GC has been for
naught. I wish that when GC encountered a cycle it would just tell the
programmer that they now have a leak, maybe even suggest where it could be
fixed with one of those neat little IDE quick fixes, and continue chugging
along.

tl;dr When you have a hard problem let someone else deal with it.

[†] Granted, Bloch does make an exception for 'native peers'. My understanding
of that pattern, however, is that the key is not letting some other object
walk off with a reference to your peer, which is exactly what has gone wrong
here.

(Aside: Josh Bloch's Effective Java & Java Puzzlers are fantastic. They are
worth reading for programmers who use any language.)

~~~
Doradus
"I wish that when GC encountered a cycle it would just tell the programmer
that they now have a leak..."

Wow. Overreact much? :-)

Garbage collection works great for one use case: managing heap memory. Please,
let's not return to a universe where you need to free each chunk of memory
piecewise. That makes freeing memory O(n) in the number of chunks freed, where
modern collectors can free unlimited numbers of objects in O(1) time. Compared
to this, malloc and free are a little like keeping score in soccer by tracking
every action that does not result in a ball entering a net.

The trouble is finalizers. Finalizers suck. They tie the management of one
resource (say native memory or file handles) to an unrelated resource (heap
memory). This has the terrible consequences highlighted by this story, plus
others. It keeps the native resource tied up until the collector decides to
run, for one thing. It also harms performance in secondary ways even when the
finalizer isn't running. For example, finalizers typically defeat escape
analysis, meaning objects with finalizers can't be stack-allocated.

It's a good idea to use finalizers mostly for detecting invalid final states
of objects (like leaking resources), as long as you stay aware that finalizers
can slow down your program in unexpected ways.

[These are my personal opinions, not those of my employer.]

~~~
xamuel
>That makes freeing memory O(n) in the number of chunks freed

A GC can't outdo that. For everything the GC frees, it has to do _some_ work
to answer the question, "Should I free this or not?" Even if that question can
magically be answered with a single CPU instruction, that's still O(n) answers
in the number of chunks freed.

~~~
Doradus
I think you're referring to a mark and sweep collector. Java's garbage
collector uses better algorithms. The collector never even looks at objects
that are not referenced from the root set, and an allocation operation is
nothing but a pointer bump. Even Cheney's algorithm from the early 1970s works
this way, and collectors have only improved since then.

~~~
TheLoneWolfling
Except that the OS zeros memory when it's released...

~~~
Doradus
The JVM doesn't release the memory to the OS when garbage is collected; only
when the heap shrinks. Any zeroing the OS might do is proportional to the size
change in the heap, not to the number of objects freed.

~~~
TheLoneWolfling
Good point.

However, that brings up a related issue - namely that the JVM requires all (or
close enough to all) object memory to be initialized. And given that the
amount of memory required to be initialized is at minimum linear w.r.t. the
number of objects required to be initialized...

Work done is still O(n) minimum w.r.t. the number of objects allocated
regardless.

~~~
Doradus
Well, it's hard to deny that the work done by the program is O(n) (or even
Ω(n)) in the number of objects created by the program. That's almost
tautological.

The thing that is interesting about the asymptotic complexity of garbage
collection, though, is that it gives you a peek into the future of GC overhead
as heaps get larger.

Consider a program with some bounded steady-state memory consumption, and a GC
that takes O(1) time to free unlimited amounts of memory. In such a system,
you can reduce your GC overhead arbitrarily simply by enlarging the heap. A
larger heap gives a larger time interval between collections, yet doesn't make
the collections any slower. Don't like 6% overhead? Fine. Make the heap 10x
larger and you have 0.6% overhead.

This time-space trade-off is always available to the end user of such a
system. It's certainly not available with more primitive systems like
malloc+free.

[These are my personal opinions, not those of my employer.]

~~~
TheLoneWolfling
_It doesn 't matter_ what the time to free memory is as long as it is O(n)
w.r.t. the number of objects (or less), the limiting factor is allocation. As,
in steady state (as you are assuming), all memory freed will be reused, and
(effectively) all memory used will be written to at least once, and memory
used by a group of objects is at least O(n) w.r.t. the number of objects.

So no, I don't see the advantage you say. I see that, in the _best_ case, it's
on par with non-GC'd systems, asymptotically speaking.

And yet, I see plenty of disadvantages. To start:

I have yet to see a GC that's O(1) w.r.t. the amount of memory to free. Best
case? I'll grant you that, for an extremely naive copying collector. But the
chance of that best case occurring is... not significant. Especially with the
more sophisticated algorithms.

And again, as above, it doesn't matter how long it takes to free memory. As
long as it's linear or sublinear, it'll be dominated, asymptotically speaking,
by how long it takes to initialize that memory again. And in steady-state
operation it will be reinitialized.

Not to mention the practical effects.

For instance: GC thrashes cache. Badly. It completely kills the concept of a
working set. You can't work around that by making your heap larger - if
anything it makes it worse. It makes the time period between thrashings
longer, yes, but makes them worse when it does happen.

GC also doesn't scale well to large numbers of cores. In particular, your
worst case "always" will be when the GC has to check with every core to see if
it still has a reference to something. And given that processor numbers are
where most of the performance improvements seem to be coming in... You
mentioned memory size improvements but didn't mention increasing numbers of
cores.

GC also (almost always) has field(s) per object. Minimum of O(1) overhead per
object initialized.

Which means that, generally speaking, the overhead doesn't asymptote to zero.
It asymptotes to a constant - and non-negligible, at least from what I've seen
- amount of overhead.

~~~
Doradus
I seem to have hit a nerve here. Perhaps I haven't been clear about some of
the points I'm making, which I don't think are all that contentious.

I'm not saying a fast GC can reduce the cost of allocation. I'm saying a fast
GC can reduce GC overhead. I think that's an uncontroversial statement. I can
only insist so many times that Java's GC doesn't touch dead objects at any
time during its scan. If you want to disagree with me, that is your
prerogative.

I agree that a GC walk can thrash cache when it happens. However, a copying GC
can also rearrange object layout to improve locality. Which effect is more
significant depends on the workload.

I didn't mention the number of cores because it didn't occur to me that it was
significant to you. GCs can scale just fine to many cores. No core has to
"check" other cores: each core can check its own local data, depending on the
tactics employed to deal with NUMA. There are always challenges in scaling any
parallel workload, of course, and there are always solutions.

Our GC (in IBM's J9 JVM) uses 8 bits in each object. With classes aligned to
256 bytes, there are eight free bits in each object's class pointer. Hence,
J9's object model has one word of overhead on top of the user's data, which is
what malloc/free typically needs to track each object's size anyway. No
disadvantage there.

[These are my personal opinions, not those of my employer.]

------
btown
From the article:

    
    
        public type1 getFrame() {
          type2 child = this.getChild();
          type3 var = this.something();
          // `this` may now be garbage collected
          return child.somethingElse(var); // segfault comes here
        }
    

> Where the destructor method of this calls a method that will free() native
> memory that is also accessed by child; if this is garbage collected before
> child.somethingElse() runs, the backing native code will try to access
> memory that has been free()ed, and receive a segmentation fault.

If child needs some block of shared memory contained in the parent (this)...
shouldn't there be a shared object referenced by both parent and child that
owns the block of shared memory? Then as long as child isn't GC'd, even if the
parent/this is GC'd, the shared memory-owner won't be GC'd, and the memory
will be safe.

For the C++ equivalent of this pattern, you'd want to scope the shared memory
to a reference-counted RAII resource (say, a shared_ptr), and make sure that
any actor that interacts with it holds a reference for the actor's lifetime.
It's unreasonable for Java to "guess" that you meant anything different from
this pattern - what if you actually wanted the parent object to be eligible
for GC (say, if it's hogging other resources the child doesn't need) between
this.something() and child.somethingElse()?

So I'd consider this a WontFix if I were a JVM developer, and maybe write a
blog post about it and put it on HN if I had the time. Something something
feature not a bug, etc.

EDIT: To be clear, the child is trying to have low-level access to a resource
it has no ownership stake in, implicitly or explicitly. So there's no reason
for that resource to stick around for the child. It's like a spouse taking the
house in a divorce - the "prenup" in Java is that only reference-holders have
any expectation for an object to be accessible to them. If the "child" didn't
want to find itself segfaulted on the streets, it should have thought about
writing that "house" (the shared memory) into its prenup, hmm?

EDIT 2: The analogy kind of fell apart, I think. But my point still stands.
Apologies to any divorced HN'ers I may have offended.

------
pdeva1
this is not a bug. author doesn't seem to understand how jvm or managed memory
works. he uses memory from parent object in child code without holding a
reference to it in child object. If his code was in pure java this simply
won't be possible. The only reason he is able to do this is via native code.

Even in native code, it is wrong to have a pointer to managed memory without
using the Global/Local Ref() methods that the jvm provides.

In summary, the jvm is behaving correctly with respect to GC. Author doesn't
know how to use JNI and has caused the crash.

~~~
xxxyy
The author has not written the bad code. From what I understand he was just
debugging it.

>The dashboard was written in Java, and the source was available (under a
3-clause BSD license), so I dove in, (...)

Also, nowhere in the post he says that it is JVM's fault.

~~~
im2w1l
>The issue was that Java was making an unsafe optimization (I never bothered
to figure out if it is the compiler or the JVM making the mistake, I was
satisfied once I had a work-around).

He says it was the JVM or the compiler's fault. He doesn't blame the code.

~~~
coryfklein
IMO, whether he though this explicitly or not, he was really just disagreeing
with the Java spec. Of course, effectively any bug you encounter could be
dealt with by blaming the spec writers, but in this case I may agree with him.
Depends on how much this particular optimization saves.

------
kppullin
I've faced similar issues in the .NET world. There is a built in method,
`GC.KeepAlive()`, that does nothing other than keep the reference alive to the
eyes of the garbage collector.

Raymond Chen explains it well here:
[http://blogs.msdn.com/b/oldnewthing/archive/2010/08/13/10049...](http://blogs.msdn.com/b/oldnewthing/archive/2010/08/13/10049634.aspx)

~~~
lamuerteflaca
It completely defeats the whole point of a garbage collector if you have to
resort to hacks like this. I understand why it is done. A lot easier to use
the work around then to fix the root problem.

Unfortunatelly these types of bugs have a nasty way of showing up in other
places where it can be even harder to debug. Long term these bugs need to be
fixed otherwise your code will just be a ticking bomb. The worst part is when
they combine with other bugs and they become even harder to debug.

~~~
dietrichepp
I wouldn't say "completely defeats". The hacks are usually there only when you
need to manage the lifetime of something that's not managed by the garbage
collector. I've seen it used for mutexes, for example, to keep a mutex alive
and locked.

You could say that the problem is that people abuse the garbage collector to
manage something it shouldn't manage, or you could say that the problem is
that garbage collectors are bad at managing anything that isn't memory. (I
would lean towards the latter.)

------
maweki
I am very disappointed that this.getSize(); at the end fixes the issue. Since
it has no side effects and the value is never used anywhere, I would expect
the compiler to remove that call.

~~~
tormeh
The Java compiler does almost no optimization. I think it's an idealistic
thing: All optimization is supposed to be done by the JVM. Personally I think
that sounds stupid.

~~~
Terr_
The compiler doesn't (shouldn't) know what the runtime system will look like,
so it cannot make optimizations as safely or effectively as the JVM can.

~~~
TheLoneWolfling
Except that, as evidenced by this case, the JVM isn't doing the optimization
either!

------
TheLoneWolfling
Sounds like a particularly nasty bug.

Especially as there's nothing preventing the "fix" from being optimized out by
the JVM, which would reintroduce the problem all over again.

~~~
hawkice
The issue is an unsafe optimization. If you change optimization to re-
introduce the bug, that's a pretty straight-forward regression.

~~~
TheLoneWolfling
I meant the author's workaround of adding a call to `this.getSize()`, not the
JVM fix.

~~~
hawkice
Ah, yes, the instability of fake fixes to compiler bugs. I recall a friend of
mine once had a bit of C code that would compile and run correctly only if
debugging was enabled. His fix, to get it to run with full optimizations
without segfauting was what he referred to as "the God code": int god = 1; To
this day no one knows why that did anything at all, but at some point the
compiler bug was fixed without anyone filing this bug that we know of.

~~~
TheLoneWolfling
May not even be a compiler bug in that case.

That's a classic symptom (and "fix") for an array that was {over}/{under}run
by one (or some small number of) element(s): the extra allocated space on the
stack may have prevented it walking into unallocated memory (and segfaulting).

Now, if he'd run it under valgrind and it showed no problems of that sort, and
it still segfaulted...

~~~
hawkice
My understanding is that it was reversed from that, where with stack canaries
and debug tracing nothing odd was happening, but when things are run without
defenses it _stops_ working, instead of starting to work because you're paying
less attention. That was why it was so odd -- obviously the array overrunning
(or passing (void*) &local_var into subroutines where it's cast to a different
type, etc) would be a pretty obvious candidate.

------
jryan49
If child needs this then why is it being free()d in the first place? I don't
get what's so strange about the behavior here.

------
gizmo686
I remember this bug. Our workaround was to parse the output of toString() for
0x0000000, and simply drop the frame when it happens.

~~~
threeseed
I'm curious. Was it something that existing across many JVM updates ? Given
the stability of the language if I have ever seen an issue (rare) there's
always been a choice of 10-20 JVM versions to pick from.

~~~
gizmo686
The competition gives us six weeks to build the robot, before we have one or
two three day competitions. Apart from possibly attending non-official off-
season competitions, that is the only time we would use the code before the
next years competition, which is completely different, and for which we are
required to right entirely new code[1].

Once we got a hack working, we ran with it and rarely tried to clean up. In
this case, we didn't bother trying a different JVM.

[1] Unless the code is open source (as ours was), in which case we were
explicitly allowed to use it.

------
deepsun
Do not implement finalize() with side effects. Better to not implement it at
all, as recommended everywhere.

------
Hermel
One more reason not to use "finalize". Did you know that there is no guarantee
that it every gets called? That's because there is no guarantee for objects to
be garbage collected as long as there is enough memory.

------
greenleafjacob
Team 1024! Always an easy name to remember. Hello from team 166!

------
kdrakon
I only have an amateur interest in JVM development, but doesn't this bug break
some rules around GC 'object roots'? Even if 'child' and 'something' are
referentially copied to local variables on the calling stack, I believe 'this'
should continue to act as a root for both of them. I think.

~~~
adevine
Not necessarily. Those "getChild()" and "something()" methods could, for
example, return new objects. this is only the root of child and something if
this retains references (i.e. member variables) to these objects, which I'm
guessing was not the case in this example.

~~~
kdrakon
Ah, yes, didn't consider that. But what about 'this'?; for what reason could
the JVM choose to GC an object that the calling thread is currently running
in? The example method exists in an instance of 'this' and isn't static.

~~~
Dylan16807
Execution in Java does not have a concept of "current object". 'this' is
passed in a magic way, but it does not exist in a magic way. It's like any
other variable, and can be collected as soon as it's not needed. Which can be
extremely early, because optimizations can reduce the 'need' for a variable to
only its side effects.

Go pick apart aggressively optimized and inlined code and I bet you can find
situations where 'this' never even exists.

------
sorokod
I think that

    
    
       this.getSize(); // bogus call to keep `this` around
    

could be optimized away since the value is never used.

------
lamuerteflaca
That is a huge bug. I hope it's been fixed.

~~~
threeseed
It was from 2 years ago i.e. early Java 7 days.

There have been 116 updates since then (33 public). Given that segfaults
almost never happen in the JVM (I've never seen one) I can imagine it would be
the highest priority to be fixed.

~~~
xxxyy
This is not a Java bug.

~~~
mlvljr
How dare you.

