Wow. This is a pretty big bug.
The official bug report: https://bugs.openjdk.java.net/browse/JDK-8204322 as Stuart Marks, a jvm architect, posted.
I'm surprised it stayed uncaught for so long.
For the unfamiliar, the way string concatenation is compiled changed with invokedynamic.
Previously, it would compile to either Sting.concat calls or transform into a StringBuilder, but with invoke dynamic we can leave it to the runtime to optimize when it first runs.
I'm not surprised. Java 9 has minor adoption, and having side effects on the left hand side of an assignment operation is really really bad practice.
So you'd need a codebase that (1) was updated or written in a very new Java version, (2) contained the smelly code and (3) had a harmful double side-effect that got noticed. That's a pretty rare combination.
> having side effects on the left hand side of an assignment operation is really really bad practice.
No it isn't. Something like this:
data[index++] += 'a'
Would easily pass any reasonable code review. It's not that smelly or unusual.
It might be unusual with strings, which makes this less common, but it's not an unusual format at all. It's a non-trivial part of the reason ++ operator exists in the first place.
> Would easily pass any reasonable code review. It's not that smelly or unusual.
I wouldn't pass that in a code review without comment!
I'd ask the author to consider breaking down the two actions into two separate lines so that it's more clear that there are two actions and what the sequential ordering of them is.
Inline side effects like that are a mess. Without looking it up, I could not even tell you if the sequence point ordering of that kind of expression in general is defined in C!
> Inline side effects like that are a mess. Without looking it up, I could not even tell you if the sequence point ordering of that kind of expression in general is defined in C!
What are you calling "inline side effects" here? The ++ operator or the += operator? Because the index-of a ++ operator is extremely well defined.
In fact that code snippet works in every language with a ++ operator exactly as you'd expect, there's really no tricks in play here. Drop that code into C, C++, Java, JavaScript, etc... and it does the exact same thing in all of them via well-defined rules. It will append 'a' to data[index] then increment index.
We're not even into the wierd land of mixing bitwise & boolean operators on the same line where you operator precedence becomes non-obvious.
Sequence point ordering is not relevant to the example code (assuming of course that there are no macros involved here):
data[index++] += 'a';
All that matter is what the ++ operator evaluates to - the sequencing of the side-effect modification to 'index' is irrelevant. The ++ operator evaluates to the value of 'index' before the increment is applied, so we know that the is array location which will be modified.
As it happens, in C the modifications to the location in the 'data' array and the 'index' variable are not sequenced relative to each other, but that is not observable.
In Java, the sequence of point ordering is remarkably clear and I would expect professional to understand it. I tend t be conservative in syntax and avoid less known parts, but this part is well known.
You dont need to be professional to understand that code. It is not obscure what it means. You need someone who knows what ++ operator means and that is usually lesson 1 or 2 in Java for complete beginners class.
If you never ever worked with Java and coded in C, then it might be different. Typical Java programmer, even weak one, is fluent in Java 101, but does not really know what might confuse C programmer. In any case, I would expect C programmer finding job as Java programmer for some reason to simply ask someone in office.
It's not about that specific code - it's about in general having side effects inline in an expression.
I can't recall perfectly from the top of my head the rules for the ordering of inline side effects in general in C and Java, and I work in the VM research group at Oracle. Can you? Are you sure about that?
I’m not really sure what you mean by “the rules for the ordering of inline side effects”. I think they’re usually relatively simple to understand, such as with postfix and prefix incrementation operators; but I’m sure there are other cases I’m not aware of. Care to expand on these?
Is everyone deliberately ignoring the bit where I said I was talking about inline side effects and how they sequence in general and how it wasn’t about this specific code? Inline side effects are subtle and confusing. Best to avoid in general in my opinion.
Edit:
You’ve completely changed your comment out from underneath me after I replied. That’s rude.
To answer your new question - you're not alone in not know what I mean by sequencing of side effects! Not many people understand the rules for sequencing side effects! That's my point it's subtle. The JVMLS gives you the rules, but it also says
> It is recommended that code not rely crucially on this specification. Code is usually clearer when each expression contains at most one side effect, as its outermost operation, and when code does not depend on exactly which exception arises as a consequence of the left-to-right evaluation of expressions.
I'm just following that advice!
For C it's worse - what order are two side effects run in a C expression? Usually the rule is 'it's up to the compiler - it can do them in any order it wants.'
The ordering/sequencing of side effects is not the problem here. It doesn't matter at all whether index updates before data, or data updates before index. Few people know, and even fewer need to know.
What you need here is just knowing what "index++" means, nothing about sequencing at all.
I could understand your objection with the original version of the code, where it referred to both "i++" and "i" in a single line. That depends on sequencing rules and is a bad idea to allow.
But this version, that only refers to "index++" once, doesn't have any subtleties.
My rule is really simple to explain to a junior - "don't use any inline side effects."
How do you explain your rule to a junior? You're going to have to tell them that inline side effects are fine, as long as you follow another set of rules about sequencing - a set of rules you admit most people don't know at all and fewer people understand correctly!
I use rules I can defend. Be defense, I mean can explain impact on readability or maintainability - personal preferences dont count. E.g., if I have to use "I don't like it" then it is bad rule. Also, I deal with code in question, not with imaginary different code. This particular line is plenty readable and maintainable. People typically don't know about X is argument, but then people really typically must make mistakes about X. This is shared industry standard is an argument, but again, then is must really be industry habit. Not just me wanting it to be so.
Also, if the junior cant comprehend reasoning, he/she will have hard time with expressions like "inline side effects". The word inline is not much used in Java world.
If there is style guide, then it is different matter and the argument is about usefulness of style guide itself (e.g. you follow the rule unless rule harms, even if the rule is overly broad). Absent style guide, imposing your personal preferences on good lines, because entirely different line would be unreadable is not something I can support. I used to follow such instructions, but don't anymore because such blind follow tactic lead to all kinds of problems for me.
When I worked with juniors, I wanted them to understand so that they can make own decisions as soon as possible. I did not wanted them to memorize hundreds of random rules.
index++ uses the variable and then increments it. From top of head, no doubt and 100% sure. ++index would do the opposite order. It is easy to remember and makes sense.
The ++ operator is not obscure at all and used more often then for example += operator.
What does the JVM language specification say about this? That's a pretty universal opinion we can all agree to follow.
15.7, Evaluation Order:
> It is recommended that code not rely crucially on this specification. Code is usually clearer when each expression contains at most one side effect, as its outermost operation, and when code does not depend on exactly which exception arises as a consequence of the left-to-right evaluation of expressions.
So when I review I advise, as per the spec, that expressions contain a single side effect.
If you think brevity means clarity, try reading some old Fortran code with comments stripped out and single-letter variable names like i, j, k, b, g, h, m, p, q, etc. (I say Fortran because in that era, such variable names were much more in vogue to save a couple of bytes here and there - but really, this can be done in any language).
Even simple algorithms (like finding the center of the circle through three given points) become puzzles. But sure, they are brief.
I don’t necessarily disagree. But I do think that all other things being equal, less code is better than more code - fewer bugs, easier to understand. The catch is what falls into “all other things being equal”. Where I’ve been frustrated with others is when they reject brevity in situations where it’s idiomatic for the language (e.g., ‘it’ as an implicit variable in Groovy list comprehensions that aren’t especially complicated). Forgive me if I sound crazy; maybe it’s just a reaction to having spent so many years in the Java community, where no name is long enough, and no amount of code is too much.
True, the world could do with less code that looks like Fizz Buzz Enterprise Edition :)
In this case, though, the OP makes a strong point that constructs like data[index++] = value; aren't idiomatic to Java - even the spec advises against this.
And in any case, side effects in assignments can be a tricky thing to understand. Sometimes they are necessary, as in
status = perform_computation_on_data(some_mutable_data);
But in the example discussed is not in this vein. I'd say it's not pretty for the same reason as cramming two statements on one line:
data[i]=value; i++;
is not pretty: it's trying to do two things with one line, which kind of defeats the point of breaking up code into lines.
So I don't disagree with you - but there's a happy medium between Fizz Buzz Enterprise and Code Golf - style code :)
Unlike `i`, `g`, or `p`, the variable `it` is an actual word in English, similar to other pronouns used as keywords, e.g. `this`, and other English word classes, e.g. `if`, `class`, and `try`, so your example from Apache Groovy doesn't really fit the argument.
I'd call this code out as unusual at the very least, particularly if this code appeared in a loop. In that case, I'd normally expect to use a for loop, where the ++ operator appears only in the increment part of the loop declaration.
It can easily happen when you are iterating through list while updating another array with data from list (conditionally or doing computation or whatever). Pretty common case.
There's a lot of cases where you can't "just use a for loop" such as if you're looping over something else, or if you're looping over 2 things. For example replace it with something like:
data[i++] += other[j++]
Yeah you could do this other ways, but it's not actually uncommon. You'll see this sort of thing in collection implementations regularly, for example, or things doing parsing. Not necessarily with the append operator as well, but certainly indexing into an array with a ++ variable.
Ugh. Today’s code sample on Sesame Street, brought to you by the letters “F” & “U”, and the number 2. Why would you subject somebody to something that edgy?
This is a quite embarrassing bug because it affects probably the most used class in Java.
However I think using += for the Strings is not that popular among experienced developers because they want to be sure that no unnecessary Strings are created in the pool.
I assume the execution times would be similar with the optimizations you mentioned, but according to Effective Java(3rd edition, published this year) using += for Strings in the loop is still 6,5 times slower than using StringBuilder. I need to check it myself.
I am, and i know other people who are. I'll be interested to see how the LTS situation shakes out for people not paying for Oracle support. I know some people who aren't bothering with LTSs, and are planning to upgrade every six months.
Does this only apply to shops that cannot easily upgrade to 12 once it's out? Or are there other support-related problems with adopting 9/10 for production for server-side apps?
Even if it isn't LTS, does it make sense to.stay on the 8, which will be end of life soon anyway?
At least, programming using modules makes the codebase ready for Java 11.
The only thing that I don't like is the removal of javapackager without substituting it (jlink cannot produce bundles with applications, only a light redistributed image)
Edit: This is obviously a bug and needs to be fixed. My post is more about how the syntax "+=" is usually explained as shorthand for "x = x + 1" (https://www.tutorialspoint.com/java/java_basic_operators.htm), when it's not strictly true (depends on what's in "x").
"+=" is [usually explained as] shorthand for repeating the left side. Why should
> foo[i++] += 1
evaluate differently than
> foo[i++] = foo[i++] + 1
?
The way it currently works is the way I would expect it to work, but I'm evidently in the minority here. I also don't write Java and since this is a change from Java 8, it needs to be fixed.
Let me state that again: yes, of course this bug needs to be fixed.
> The way it currently works is arguably the way it should work, irrespective of the spec.
The spec is how it should work, regardless of developer's opinions.
> Why should [a] evaluate differently than [b]?
In my opinion (and I'm guessing in the opinion of the spec authors, though I can't be sure), saying "x += y" means "add y to x", so I'd expect the language to:
1) Figure out what "x" is once
2) Add "y" to it
Your opinion seems to be that saying "x += y" means "desugar to x = x + y" and you expect the language to:
1) Figure out what "x" is as a value
2) Add "y" to the value of "x"
3) Figure out what "x" is as a location (which may differ from 1)
4) Store the result into "x"
To me step 3 is very surprising, that "x" may differ, which is why I think the spec's definition is the more intuitive one (plus, it doesn't require knowledge of "desugaring" things).
I understand this point and it's true that there would be some elegance if both statements were equivalent but I see greater utility without that equivalence:
Rather than a redundancy and sugar on + and = ("I do not want to type the left side twice"), it is a distinction I am making to the compiler ("I do not want you to evaluate the left side twice").
They're not the same thing semantically, even though the final x86/64 assembly might be the same after a compiler is done with it. '+' is addition but '+=' is addition "in place" which means that the calculation is performed in the location where it will be stored (although, again, these might look identical when talking about register allocation and whatnot in the final assembly). This is relevant to reordering compilers because it allows them to do certain optimizations that are normally hard to prove safe because of aliasing rules, especially in lower level intermediate forms where typing and other information might be lost.
Would you also argue that min(foo(), bar()) should evaluate one of the functions twice? When you explain the definition as "(a < b) ? a : b" it's only natural, even though such a macro is an infamously wrong way to implement min.
I think it's a big deal: introduced in 9 and fixed in 11? It suggests two things: Oracle has gaps in the regression test suite, and they don't consider compiler error such as these as a big issue.
interesting. no, it is not my issue - in my case nothing gets rendered in WebView, the view stays blank.
and another thing - how does one submit a defect report to Oracle? neither bugs.openjdk.java.net nor bugs.java.com have a clear link, and I can't find where one can register an account.
JavaFX HiDPI support on Windows also underwent breaking changes in minor Java 8 updates. It's probably for the best that JavaFX can have its own, independent version number now that it was kicked out of the JDK.
If I understand correctly "array[i++] += ..." alone is enough to trigger the bug which doesn't seem particularly tortured to me. Actually any "array[side_effect()] +=" would trigger the bug. That's pretty bad IMO.
Even if the function within the subscript were free if side effects, calling it twice would be a ridiculous performance regression. I’m no compiler developer, but this really seems like a rookie mistake.
Not so sure about that... let's say this bug is causing a minor inconsistency in any one of the thousands of databases built on top of Java, not the least one of the most popular databases Elasticsearch. We may not know yet, but this bug could theoretically be causing different output. In fact if we see anything from the Lucene project related to this bug, then we know for sure ES and other products are likely affected.
We've updated the link from https://stackoverflow.com/q/50683786 and the title from “Operator += is broken for Strings in Java 9 and 10” to the official issue page.
Adding side effects to your array index expression and depending on order of their evaluation is a bad idea anyway.
Unfortunately, the `++` / `--` made it to Java, and are still used sometimes, despite JVM not being PDP-11, and definitely not trying to conserve every precious byte of RAM.
Apparently it’s nothing specific about arrays or ++ — any expression on the left hand side of the compound operator is evaluated twice, so any side effects happen twice.
If you read the original post to the end, you'll notice a piece of code that seemingly refutes this statement. That is, in case of Java 8, the index expression is evaluated once. and in case of Java 9, twice.
What does that refute? You seem to be agreeing with the post you're replying to.
But importantly your original point about "depending on order of their evaluation" is wrong. Most of the examples don't care about order at all. They break because things happen the wrong number of times.
Do you have a reference for this? If I remember correctly the Java spec says nothing definite about evaluation order and the memory model mentions reordering explicitely ("happens before" relationships rely on a memory barrier; without that, no guarantees).
Thanks. What I had in mind applies to distinct statements not expressions.
Funnily the chapter starts with the following caveat:
It is recommended that code not rely crucially on this specification. Code is usually clearer when each expression contains at most one side effect, as its outermost operation, and when code does not depend on exactly which exception arises as a consequence of the left-to-right evaluation of expressions.
It's worth noting: the bug in question only occurs when it's an array of String[]. Strings are a special case for just about everything in the compiler. :(
I used that operator in a tight loop before the compiler was smart enough to automatically convert the code to use a StringBuilder. Boy was that slow...
I didn't touch much Java Code for ~ 10 years now. I remember reading about the string concatenation problem in the Effective Java book. Using a StringBuilder was the very first thing I usually tried when I needed to improve performance on some unfamiliar code.
I didn't know the Java compiler converts this automatically now. Since which version is this? I suspect there might be still edge cases where it doesn't work.
To be marginally less facetious, Rust would "fix" this problem by dint of deliberately neither having a ++ operator (in either post- or pre-increment varieties) nor by allowing assignment operators (including augmented assignment ops) to evaluate to the value of their RHS, so the example from the OP would have to be written in a different fashion. :P (Rust has previously had bugs of its own related to the evaluation order of += and friends, it seems like a problem that nearly every language has at some point...)
[2]> (macroexpand '(incf (aref a (incf i)) x))
(LET*
((#:G3214 A) (#:G3215 (INCF I)) (#:G3217 X)
(#:G3216 (+ (AREF #:G3214 #:G3215) #:G3217)))
(SYSTEM::STORE #:G3214 #:G3215 #:G3216)) ;
T
You can't just blindly expand the += by copying the expressions/AST node on the left. Or in general; you don't take expressions in the program and blindly duplicate them. How can someone working on compilers not know that???
I see what you're getting at, but that's not the mistake that was made here. People working on it knew what it should do. It's in the spec so Java agrees with you.
This bug is a little narrower than this makes it out to be, it ONLY fails when the array on the left is a String[ ].
The way String operations get compiled down to bytecode changed a lot around those releases so the bug is more like an unforeseen interaction between optimization passes than a naive MST/Macro implementation.
The "funny" thing here to me is the surprising gap in the regression testing given the quality of most of the JDK.
go figure
Yea... what was the Bloch's explanation for not having operator overloading in Java? I think it was something like "it's not allowed because I've seen too many programmers who did that in a wrong way" :)
On the other hand this explanation is stupid, as a programmer can implement `add()` in a wrong way.
For the unfamiliar, the way string concatenation is compiled changed with invokedynamic.
Previously, it would compile to either Sting.concat calls or transform into a StringBuilder, but with invoke dynamic we can leave it to the runtime to optimize when it first runs.