
Using JDK 9 Memory Order Modes - BenoitP
http://gee.cs.oswego.edu/dl/html/j9mm.html
======
aardvark179
> This guide is mainly intended for expert programmers familiar with Java
> concurrency, but unfamiliar with the memory order modes available in JDK 9
> provided by VarHandles.

I think it's important to emphasise this. You should think extremely carefully
before reaching for features like these and should have a very good reason for
not using the existing concurrency tools and data structures. That said, if
you do need to build your own concurrent data structures then these do provide
excellent primitives with which to do it.

~~~
derefr
Right; the proper consumer of material like this (for anything other than
intellectual-curiosity purposes) isn't your regular line-of-business
developer; it's more the the developers of the concurrency libraries such
developers consume, e.g. actor frameworks like Akka or Vert.x.

~~~
aardvark179
I was going to say it was even more limited than that, but vert.x does seem to
have quite a few uses of AtomicLong, Integer, and Boolean which could be
replaced by primitive fields and VarHandles. Whether it's worth it depends on
how many of them are actually used at run time and whether removing the extra
indirection and footprint is important.

It also makes heavy use of the JDK's concurrent maps, latches, and other
structures which it would probably be unwise to try and re-implement.

~~~
axiak
Those uses of AtomicFoo classes could probably be updated to use
AtomicFooUpdater fields (e.g. AtomicIntegerFieldUpdater). VarHandle is like
AtomicFieldUpdater but with all of the concurrency semantics that were
accessible only via Unsafe access.

~~~
_old_dude_
In Java 9, all updaters have been rewritten in term of VarHandle. All
reference updaters are less efficient that their VarHandle counterparts
because they rely on generics.

In Java generics are erased at runtime, so the reference updater code has some
supplementary runtime checks.

~~~
sreque
This isn't correct. The atomic updater classes don't rely on generics to do
anything. What they do is perform safety checks on every access to ensure you
don't segfault your JVM, resulting in heavy performance losses compared to
sun.misc.Unsafe.

~~~
xxs
Actually it is true (about generics and erasure) when it comes to
AtomicReferenceFieldUpdater. The check is likely to be around 1 cpu cycle
since the branch is virtually always predicted.

~~~
_old_dude_
It can be one mov + cmp + je in assembly, but it's not always the case.

It's an instanceof check, if the class of the field is final (or can be proven
to be effectively final by CHA) then it's a null check follow by a pointer
check. Theoretically, the null check can proved to be unnecessary, be fused
with another or be transformed to an implicitly nullcheck but in specific the
case of an AtomicReferenceFieldUpdater, null is usually a valid value (by
example, null is used as tombstone in many concurrent data structure
implementations) so it's usually more than just a getClass() and a pointer
check.

~~~
sreque
From what I've read, until recently this wasn't actually the case for the
field updater classes. Instead, Java does things like not trusting your
fields' finality and not treating anything as constant. Let me see if I can
find the blog post.

Here we go: [https://www.google.com/webhp?sourceid=chrome-
instant&ion=1&e...](https://www.google.com/webhp?sourceid=chrome-
instant&ion=1&espv=2&ie=UTF-8#q=atomic+field+updater+optimizations+java+8&*)

------
davidtgoldblatt
tl;dr for people familiar with the C/C++11 MM: It's very similar. `relaxed` ->
`Opaque`, `seq_cst` -> `volatile`. `acquire` and `release` map more or less
the same.

Interestingly, there's no equivalent to C++ `acq_rel` on the RMW operations --
you have to either choose the stronger `volatile` ordering, or do a release
fence followed by an acquire RMW (or the reverse).

Regular loads/stores may be mixed with atomic ones, but don't give any
coherence or forward progress guarantees, and may see word tearing for longs
or doubles.

The java `fullFence()` is stronger than a C++ `seq_cst` one; inserting one
between every pair of Opaque accesses gives them sequential consistency
(though, I understand that C++ is probably going to strengthen the semantics
of `seq_cst` fences eventually).

~~~
gpderetta
It is not surprising, the work on the C++ MM model was based on the Java one.
I suspect there is a large overlap on the people working on both.

------
exabrial
It's early in the morning, brain not fully functional... what can you do with
a VarHandle that you couldn't do with reflection?

~~~
aardvark179
There's two real differences. The security checks are done at lookup rather
than when the handle is used, and they provide atomic operations and memory
ordering which reflection doesn't give you. Think of them as type safe
versions of some of the field access methods in sun.misc.Unsafe.

~~~
_old_dude_
var handles also uses the polymorphic signature trick to avoid boxing/unboxing
of the arguments you get with the reflection (the same trick also allows to
introduce 64 bits index array in the future BTW).

and you can also do all memory operations on arrayish values (array or
ByteBuffer) even un-aligned acces.

~~~
Flowdalic
Care to extend on that "polymorphic signature trick"?

~~~
aardvark179
There is an annotation that can be used in the JDK to mark methods as
polymorphic. The compiler will declare the method call with the types of the
arguments and result where it is called, and the VM will handle it. This
allows a very small set of core methods to take variable numbers or types of
arguments without boxing them or having to put them in an array.

------
sreque
a couple of examples in RA mode involving constructors and final fields look
unnecessary to me. I thought the java memory model already guaranteed a
release/acquire fence at the end of constructor for final fields. Basically if
a thread sees an object post-construction, it is already guaranteed to see the
correct values for the object's final fields. Am I mistaken here?

~~~
mnw21cam
Indeed. It is possible to see an object half-way through construction if you
are viewing it unsynchronised from a different thread to the thread
constructing it.

This is why the common pattern of double-checked locking is bad. This is:

    
    
      class Thing {
          private Thing thing = null;
          public Thing getThing() {
              if (thing == null) {
                  synchronized (this) {
                      if (thing == null) {
                          thing = new Thing();
                      }
                  }
              }
              return thing;
          }
      }
    

Don't do that. You could get hold of a partially-constructed Thing.

~~~
sreque
It's actually not possible to get a hold of partially-constructed thing in
this manner, assuming the thing only has final variables. See
[http://g.oswego.edu/dl/jmm/cookbook.html](http://g.oswego.edu/dl/jmm/cookbook.html).

"The initial load (i.e., the very first encounter by a thread) of a final
field cannot be reordered with the initial load of the reference to the object
containing the final field. This comes into play in: x = sharedRef; ... ; i =
x.finalField; A compiler would never reorder these since they are dependent,
but there can be consequences of this rule on some processors."

Note that there are no constraints placed on sharedRef. It does not need to be
volatile, for instance.

This link also has more useful info and examples:
[https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.ht...](https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight)

~~~
chrisseaton
> there can be consequences of this rule on some processors

Right - I think the processor can still reorder. I've never been sure why the
compiler is not allowed to reorder in this case, since this doesn't help you
if the processor will do it anyway.

~~~
sreque
The "consequences on some processors" simply means that a conforming java
compiler/VM may need to insert memory barriers to enforce the memory model for
constructors, resulting in a performance penalty. A conforming compiler or VM
cannot generate code that would allow processor re-orderings to break the
memory model.

------
th0ma5
A lot of great topics are covered in this.

