
Close Encounters of the Java Memory Model Kind - signa11
http://shipilev.net/blog/2016/close-encounters-of-jmm-kind/
======
ajkjk
I'm not done reading this yet, but I have a "does this have a name/does this
exist"-type question after reading the jcstress example (section 1.1):

It seems like it would be possible to enumerate all interesting orderings of
several threads running at once at the threading-engine level, and therefore
do this kind of probabilistic test deterministically.

Of course, the number of possible orderings of two _n_ -length programs is
very high (2n choose n, since 2n instructions are run in total and you can
place the n instructions in one thread first). But the language definition
also provides some concept of 'commuting' operations. For example, unrelated
writes to two different variables can't have their orderings matter. That
should cut down the number of orderings a great deal. And it's possible useful
even if it it's only practical for small programs with few total orderings.

Is there a language or framework that does something like this?

~~~
jstclair
Sounds like you're describing Microsoft Research's CHESS (2007), and a host of
related projects (Google's ThreadWeaver for Java).

See [http://research.microsoft.com/en-
us/projects/chess/](http://research.microsoft.com/en-us/projects/chess/) and
[https://code.google.com/archive/p/thread-
weaver/wikis/UsersG...](https://code.google.com/archive/p/thread-
weaver/wikis/UsersGuide.wiki)

~~~
jkot
I made blogpost about thread weaver some time ago. Not very useful
[http://www.mapdb.org/news/thread_weaver/](http://www.mapdb.org/news/thread_weaver/)

------
banku_brougham
i barely know some java, and i have heard of concurrency and locks, and
probably understand the idea of thrrads. yet, i found this to be a thoroughly
enjoyable and mysterious read!

~~~
kovrik
Aleksey Shipilev is a great developer and writer. His articles are must read!

------
fpoling
I recently talked with a Microsoft engineer and his observation was that a lot
of MS customers have troubles running their C# code on ARM. As with Java C#
does not guarantee much beyond stating that word-sized writes are atomic
allowing a lot of reordering. However, Intel CPU are nice to developers
allowing a lot of buggy code to work, but ARM is not forgiving resulting in a
really hard to track bugs.

------
Roboprog
Wow. That made my head hurt. And, I'm more convinced than ever before that
threads remain a TERRIBLE idea. Maybe Java 13 will have something like Erlang
tasks with isolated memory and message queues (only) or something...

Somehow I expected more on memory allocation, arrangement or whatnot, rather
than concurrency access. Glad I clicked on it for the nightmarish skim,
though.

~~~
sievebrain
Complicated things are not inherently terrible ideas: your CPU is complex, but
you have compilers to abstract you from it. Likewise the ways your multi-core
machine shares memory is complicated, and the JMM partly abstracts it, but
really you're meant to use much higher level constructs. The JMM matters to
people writing lock libraries and compilers, not ordinary users.

It's a common logical fallacy that you can simplify a complex solution by
simply redefining the problem it solves as not a problem. The "Erlang model"
you discuss is in fact the model the entire computing world used before
threads were invented and there were only processes. But threads weren't
invented for shits and giggles, they were invented for good reasons!

~~~
e12e
> But threads weren't invented for shits and giggles, they were invented for
> good reasons!

I remain unconvinced ;-) It always seemed to me, that if threads were such a
great idea, operating systems should expose threads, and not processes. In a
way NT and Linux took different paths here - on Windows processes are
expensive, on Linux threads and processes are pretty much the same in terms of
overhead (but threads obviously allow sharing memory, for better or worse).

I've long suspected that if threads really could "be done right", they
would've been preferred to processes. But just like people worked really hard
to get hardware protected memory to be fast enough to be usable, I think
working hard to get processes to be "fast enough" is more viable than "getting
threads right". Amusingly, I suppose Erlang is actually an example of how
mixing the two ideas can be a great idea: Don't let programmers know their
"process" is a scheduled thread, force messaging over shared memory - and reap
the benefit of light weight processes and strong(-ish) protection between
threads...

------
abalone
I wonder whether Java will get displaced by Swift on the server at some point.
Anyone want to place bets?

No garbage collection, so no GC performance hiccups. No explicit threading for
easier concurrency.[1] Optionals for protection from null pointer exceptions.
These are 3 of the biggest problems in server-side Java programming, I'd say.
That's before we even get to the elegance of the syntax.

[1] [https://developer.ibm.com/swift/2016/02/22/talking-about-
swi...](https://developer.ibm.com/swift/2016/02/22/talking-about-swift-
concurrency-on-linux/)

~~~
johncolanduoni
> No garbage collection, so no GC performance hiccups.

If your models are naturally highly cyclic, dropping a GC can vary from
painful to near impossible. I think a better solution for most server-side
code (from both an ease of use and performance standpoint) is to use a hybrid
GC/arena approach, centered around tasks.

If most of your work and memory churn centers around discrete and relatively
short-lived requests, you can allocate all objects whose lifetimes are bounded
by the duration of the transaction in a thread-local arena. You can't beat
unsynchronized bump-pointer allocation, and you can simply throw the memory
away at the end.

I've been really disappointed with the lack of innovation in the memory
management space. Rust is the only recent language I've seen that did
something more significant than choose between GC and reference-counting.

~~~
fpoling
Working with cycles under reference counting is not impossible. The key is
good memory reporting tools. Yes, sometimes even with tools identifying a
memory leak through cycle with reference counting can be painful. However, the
fix itself is typically very local and easy to test. Compare that with a
C#/Go/Java application after it turned out that GC-induced latency is
unacceptable. The fix could be a massive rewrite that makes the resulting code
maintenance nightmare. And if in the case of Java there are at least some
knobs to tune the GC that could be enough to fix the problem, with Go or C#
the code rewrite is the only solution.

As for local object pools, they are nice as long as their size is smaller than
CPU cache. Without the control on the pull size it is very to easy to
accumulate a lot of local garbage that it starts to hurt the performance.

~~~
johncolanduoni
The problem with both approaches is how binary they are. GC latency is an
issue if the active part of the heap gets too big and/or any strategies you
have for dealing with short lived garbage (e.g. generations) get overwhelmed.
The simple solution is to give flexibility in how allocations are done. C#
does this a bit but makes you pick per type, instead of per instance. Go goes
a bit further, but it still only leaves you with the options of stack
allocated/containment and GC. What would be ideal is having stack allocated,
ARC, and GC all be options in the same program.

Another advantage of the task-focused approach is that you can optionally
collect the arena if it has gotten too big _without_ halting the whole program
(just the running task). In this case, you may have latency on one request but
it need not effect latency on other requests.

> As for local object pools, they are nice as long as their size is smaller
> than CPU cache. Without the control on the pull size it is very to easy to
> accumulate a lot of local garbage that it starts to hurt the performance.

How is this different from the ARC case? If your working set fits in the
cache, great, if it doesn't, then you'll have problems. If your
allocation/release patterns are such that the allocator keeps handing you the
same exact pieces of memory, then you're basically doing stack allocation. Not
to mention that RC doesn't help the cache-unfriendliness that comes from heap
fragmentation.

~~~
fpoling
I really wish arenas with optional GC would be more supported in main stream
languages. Erlang got it so right. That and using reference counting for
global structures where the type system or API guarantee absence of cycles
should allow for responsive and maintainable applications without the
nightmare of manual memory management.

------
Sarkie
I have a task coming up about reducing the memory of some WebLogic Java
applications, I'm not very familiar with Java.

Does anyone suggest a book / site / tool I can use?

Currently they have Min/Max/ 4 GB and Object 2 GB for the Java args and I want
to delve into it to see more about actual memory use etc.

~~~
jmick
VisualVM can do that. It helps to have some background on the different spaces
and JVM options. Ken Sipes' "Tuning the JVM" and the book he recommends are
good starting material. It is a highly nonintuitive skill.

~~~
Sarkie
thank you very much!

------
zalg0
That zalgo toward the end made me smile.

