
Why does this Java program terminate despite appearing that it shouldn't? - patmcguire
http://stackoverflow.com/questions/16159203/why-does-this-java-program-terminate-despite-that-appears-that-it-shouldt-and
======
rayiner
One of my biggest WTF moments as a programmer was dealing with a high
performance spectrum analyzer. It was a cost no object bit of experimental
military hardware, and was very impressive from that perspective. But the
software was buggy. We would see lockups, as well as discontinuities in the
readouts, which were screwing up our post-hoc data analysis. When I dug into
the code, I realized that the program used several different threads, but
there was not a lock or other synchronization mechanism in sight. You had a
thread pulling data off a socket from the device and storing it in a (single)
memory buffer, a thread writing commands through a socket to the device, a
thread pulling data out of the buffer to draw it to a waterfall plot, and a
thread writing data to disk. The lockups happened because the device didn't
like to change the scan state while reading data, and there was no
synchronization between the command thread and the readout thread. The
discontinuities happened because there was a single buffer shared by the
readout thread and the write out threads, with no synchronization. So as one
thread was writing out a sample, the readout thread was replacing part of that
data with the next sample.

I remember grepping the source for pthread_mutex or pthread_cond and slowly
realizing that whoever wrote the code didn't even see the need for locks. To
be fair, I later heard it was some poor hardware guy who knew some C who got
roped into writing the software as an afterthought to show off the awesome
hardware, a common problem with hardware companies.

~~~
_pmf_
> I remember grepping the source for pthread_mutex or pthread_cond and slowly
> realizing that whoever wrote the code didn't even see the need for locks. To
> be fair, I later heard it was some poor hardware guy who knew some C who got
> roped into writing the software as an afterthought to show off the awesome
> hardware, a common problem with hardware companies.

"Oh, you wrote a quick demo for the sales people? Well, now it's a product
sold for some ten thousand!"

~~~
rdtsc
You joke, but in military contract world that is totally fine. Everyone
already got paid and bugs will certainly not stop the acquisition process in
the future. It is all about who knows who and more about jumping around red
tape certifications, ATOs, ATCs, rubber stamps and so on.

That is often why soldiers on the ground end up with overpriced yet crappy
tools and weapons.

~~~
rayiner
To be fair, it's not just red tape. If you are the Army and your one of the
kind custom spec piece of hardware has software bugs, what exactly are you
going to do? Buy the compatible version produced by the competing vendor?

~~~
rdtsc
Ideally you have prototypes and tests before hand that should discover these
issues. Then in the future you would pick a different vendor. You can also set
up your contract to account for maintenance and for penalties for serious
bugs.

A lot of things are possible but they are not happening the incentives are
just not there.

------
boothead
How can anyone program sanely in the presence of this:

currentPos = new Point(currentPos.x+1, currentPos.y+1); does a few things,
including writing default values to x and y (0) and then writing their initial
values in the constructor. Since your object is not safely published those 4
write operations can be freely reordered by the compiler / JVM.

So from the perspective of the reading thread, it is a legal execution to read
x with its new value but y with its default value of 0 for example. By the
time you reach the println statement (which by the way is synchronized and
therefore does influence the read operations), the variables have their
initial values and the program prints the expected values.

I'm not anywhere near smart or careful enough for that... I think I'll stick
with Haskell.

~~~
mikeash
The One Commandment of multithreaded programming:

1\. Thou Shalt Not touch shared data without synchronization.

You _must_ know what data is shared, and you _must_ synchronize all access to
shared data (with one exception, when _all_ access is read-only).

People try to find exceptions to this One Commandment. They think, I'm only
writing one value, so what's the harm? Well, this is the harm. Follow the One
Commandment and you'll be safe(r).

~~~
danjayh
My exception: If all operations on the shared data are atomic, and always
leave it in a valid state, then you can get away without explicit
synchronization. Unfortunately, the only way you can really achieve this
(without some sort of a synchronization API) is to use assembly code, and take
advantage of architecture-specific data access mechanisms.

~~~
klodolph
But you'll need memory barriers. Let's say you have two variables, x and y,
satisfying x >= y at all times, and the values of x and y are never allowed to
decrease.

    
    
        // Thread 1
        atomic_inc(&x);
        atomic_inc(&y);
    
        // Thread 2
        local_y = atomic_read(&y);
        local_x = atomic_read(&x);
    

If you go through it logically, it's always in a valid state, right?

No. You need explicit synchronization. If we start at x=1, y=1, then thread 2
can see x=1, y=2, which seems impossible!

Or in short, your exception sucks. (By "sucks" I mean it is not sufficiently
detailed enough to prevent errors.)

~~~
rayiner
This is only the case on architectures where atomic operations don't imply
memory barriers. On x86, there is a total order on atomic operations, do
another CPU will never see the write to "y" before the write to "x". Indeed on
x86, there is total order on stores, so if you had two threads like that,
where one thread only did reads and the other thread did updates, you can get
away with an unlocked increment (x++; y++).

------
sehugg
It's amazing what kind of yarns people on SO will spin to get help with their
homework.

~~~
thurn
I often feel that it's necessary to stretch the truth a little bit on SO.
Otherwise you ask "How can I do X?" and all your answers will be "Don't do X.
Use Y instead."

~~~
m_myers
That happens because of the "old shoe or glass bottle?" question.[1] When
newbies (or even oldbies) ask how to do something, often the real question
could be solved much better by taking a completely different tack. This has
also been called the "XY problem".[2]

The best answers would ideally answer the question as asked (or explain why it
can't be done) _and also_ show a still more excellent way.

[1]:
[http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/...](http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/408925.aspx)

[2]: [http://meta.stackoverflow.com/questions/66377/what-is-the-
xy...](http://meta.stackoverflow.com/questions/66377/what-is-the-xy-problem)

~~~
jamesbritt
_The best answers would ideally answer the question as asked (or explain why
it can 't be done) and also show a still more excellent way._

There are times I've asked how to do X, and for various reasons X is really,
no kidding, _exactly_ what wanted to do. Still, numerous people would question
my desire to do X, turning the discussion into an inquisition on motives or
skill comprehension.

I'm sure in some odd way they all meant well, but they were spending a lot of
cycles not actually answering the question.

In those cases the better strategy is to, under some crafty pretext, bluntly
assert that _X cannot be done._

This will evoke numerous rebuttals with full details on just how incredibly
_wrong_ and _foolish_ that claim is.

Both cases play to a trait all too common to many people on discussion boards:
replying to questions with the primary goal of showing that you are, in fact,
_smarter_ than everyone else.

------
anaphor
If you look through the questioner's history it seems like they enjoy making
these sorts of threading/synchronization errors, as well as getting
angry/trolling. I wouldn't be surprised if the story about losing the 12 mil
electron microscope is total BS designed to garner rep (I see it's getting
tons of votes).

~~~
cruise02
Most of those upvotes are from one day though, so they only get about 200
reputation due to the cap. (Also, they gave away 50 points on a bounty for the
top answer, so I'm not sure if reputation was the motive or not.)

~~~
anaphor
That is a good point, although I still find it suspicious, especially given
some of their previous questions. E.g. the one where he/she created two
Haskell data types that looked the same but were using different characters in
their names and then complained it wasn't working, really?.

------
viraptor
Oh come on! This title rewriting is getting silly now. The Java application is
not the story - there are thousands of those on SO. The story is the (real or
made up) destruction of equipment by a failing program...

------
robomartin
At the core, the use of threads for any embedded system required to operate in
hard real time is a mistake. Threads and interrupts are the minefields of
real-time embedded systems.

Based on study and years of experience I conclude that the ONLY way to build
such as system is to use a time-triggered approach as opposed to event-
triggered. No threads. Only one interrupt --the timer-- in the entire system
if at all possible.

This is where I fall back to the raw simplicity of C. You really have to work
hard to do stupid shit like threads in C. By that I mean that you have to go
out of your way to bring in the code and a framework that might allow you to
do that. I've successfully written and applied at least a couple of RTOS's for
use in mission critical applications (definition: you fuck up and someone gets
hurt or dies). Again, only one interrupt. No threads. Carefully --and I do
mean very carefully-- shared data between tasks only if absolutely necessary.

A corollary to this is that I could not fathom programming an electron
microscope (the subject of the SO question) or anything that might lead to
millions of dollars in losses with Java. Couldn't pay me enough.

Anyone interested in working with hard real-time embedded systems would be
wise to study the following well-known books:

    
    
      - Patterns for Time-Triggered Embedded Systems, Michael J. Pont
      - Embedded Systems Building Blocks, Jean J. Labrosse
      - uC/OS-III, Jean J. Labrosse  (uC/OS-II by the same author includes 
        the source to the prior version of the OS)
      - Doing Hard Time, Grady Booch
      - Operating System Concepts, Silbershatz, Galvin, Gagne
    

There are other good books on the subject. The above should provide a very
solid foundation.

Don't just hack away without the necessary background. The consequences of
making mistakes out of sheer ignorance could be dire.

------
b0b0b0b
I think the real wtf was testing in production. At least their lab now has an
environment to do their qa on.

------
neya
I'm curious, could he (or you) have written a program in any other languages
(like Go) and not have worried about these issues?

Honestly, I got some mild goose-bumps after realizing the number of Zeroes
that 12 million had.

~~~
zemo
you could, but it's not very clear why you would want this to be multithreaded
to begin with, since it appears the author wants the program to quit as soon
as the value fails validation. It looks like it's multithreaded just for the
hell of it.

Anyway, you could do it in Go like this:
[http://play.golang.org/p/3Wy_S42jVj](http://play.golang.org/p/3Wy_S42jVj)

basically what you do is you create a goroutine that reads a pointer to the
Point value, checks it, and if it's bad, exits. In your main loop, send a *
Point down the channel. That basically means: I made this thing. It's yours
now. Do what you want with it. Then in the reporting channel, the value is
read and checked. The same pointer is sent back in a reply, meaning "I'm done
with this, you can have it back now". You almost certainly shouldn't be doing
this; I'm just doing this to emulate the fact that the original example is
using the same reference from both threads. Yes, this is very, very weird and
no, you should not do this in your own programs. Anyway, we take that pointer
back, dereference it, and assign it to a _new_ Point value, which is
completely weird and I still don't understand. (presumably the original author
mistakenly thought this was an atomic way to set both fields on the Point.) In
this way, the two goroutines (the main and reporting goroutine) are able to
use channels to synchronize access to some shared memory.

This is a pretty silly use of concurrency, though, because in this case, it's
probably more clear to just use a mutex to lock and unlock your Point. Again,
yes, this is sloppy and weird, but it represents the same behavior as the
original program, but properly synchronized.

~~~
taway2012
IMHO, there is a subtle difference between your version and the version
posted.

You assuming a "real" producer-consumer relationship. But the posted code
seems to imply a sort of "supervisor-worker" relationship.

The "supervisor" "snoops" on the worker's state and raises an alarm if the
worker is doing something weird.

As you stated at the end, synchronizing around currentPos seems to be most
reasonable solution (assuming that's possible). That guarantees that no Point
method is running while currentPos's state is being examined.

~~~
zemo
I'm under the impression that the supervisor wants to stop the worker as soon
as its doing something wrong; not do something like check every so often, and
stop only when the supervisor notices.

Anyway, what you're describing is pretty easy too. Basically all you do is
make a channel of Point, and have the supervisor read on that. The worker just
does some work, and tries to send his value down the channel. If someone's
listening, send the value. This is pretty straightforward, but probably not
safe in the real world, because the worker just keeps on working; it doesn't
wait for a confirmation that its work is ok.
[http://play.golang.org/p/J8Xgh8S18b](http://play.golang.org/p/J8Xgh8S18b)

If you want to structure it such that the supervisor can stop the worker, ask
the worker what it's doing, look at the data, and then reply with "ok, you can
continue now", there's a handful of ways to do it. Now we're getting pretty
real-world, and things get a little bit more intricate, but you can do it like
this:
[http://play.golang.org/p/hXYNmCm8lg](http://play.golang.org/p/hXYNmCm8lg)

the trick being that you send a channel over another channel. There's a
handful of ways you could structure this; there may be a cleaner way.

------
c0deporn
You would think that equipment valued at $12m would have safety features to
prevent damage from rogue commands. Interesting.

~~~
mpyne
Those who do not read the history of Therac-25 apparently _are_ doomed to
repeat it.

If the hardware can be destructive under the control of software then you
_must_ have _hardware_ interlocks, not just software safety features.

~~~
kwantam
Huh. I learned a different lesson from Therac-25: "never write mission-
critical software." :)

For those who haven't read it,
[http://sunnyday.mit.edu/papers/therac.pdf](http://sunnyday.mit.edu/papers/therac.pdf)

~~~
mikeash
There are three kinds of programmers:

1\. Those who read about Therac-25 and decide never to work on any systems
where bugs could threaten lives (your chosen course, and mine).

2\. Those who read about Therac-25, get frightened, and work _very carefully_
on such systems.

3\. Those who read about Therac-25 and think, "Those idiots! Good thing I'd
never make a mistake like that."

The prevalence of (3) in all professions and walks of life is kind of
frightening.

~~~
awj
You should walk through a building construction project sometime. Your "all
professions and walks of life" part is ... disturbingly accurate.

~~~
mikeash
It no doubt applies to doctors too, which is even more disturbing.

~~~
masklinn
Not as disturbing as finding this attitude in hospital construction projects.

------
methehack
Without looking at it too closely, there's no synchronization on the point.
The synchronized this (with an empty block - wtf -- that's meaningless) has
nothing to do with the point...so the point is possibly stale. I haven't
assessed the logic more than that to see if that's a possible state that would
program exit, but it's the first thing I noticed. Also, this program, as
others have pointed out, is weirdly complex -- and that's a euphemism.

~~~
conroe64
I think the logic is more to elicit a strange occurrence (bug, dare I say?)
that doesn't occur unless under some weird circumstances. In this case, this
behavior would be possible unless the value of a variable, currentPos, is set
to an object before that object's constructor finished, and all in the same
thread!

The only possible explanation is that the JIT decided to reorder the execution
so the variable was set to memory allocated for the object, and then ran the
objects constructor.

It's all very surprising behavior.

------
xedarius
That program is shocking. I'd be interested if there was even a need for
threading, perhaps you could have just time stepped your actors with a simple
Tick interface.

for all actors actor.tick(timeDelta);

------
wissler
Side question: with $12M of equipment, wouldn't it make sense to have an
independent limiter of some kind to prevent damage? Is there something about
this equipment that makes that greater than $12M in difficulty? Because
relying on having no bugs just so your hardware isn't destroyed seems like a
bad idea...

~~~
saym
I'm willing to bet the question has a fabricated back-story. If you look at
the user's question history: there are some interesting ones, but they all
could be justified as a collegiate homework problem.

~~~
quasque
Also, $12 million for an electron microscope seems somewhat on the expensive
side, by an order of magnitude.

~~~
Daniel_Newby
The poor bastards might have had the entire reticle set (photo mask) set for
fabricating a microchip loaded in the microscope for inspection.

------
chris-allnutt
THIS! is why you should always write tests.

~~~
wtetzner
Actually, concurrency is one of the areas where tests aren't particularly
reliable. Whether the tests pass might depend on how the threads happened to
be scheduled on that particular run.

