From the article: The reaction of the Java developer community to our report is ...

tveita · on Feb 24, 2015

It looks like the Python version that's good for 2^49 items uses 1360 bytes for this array, allocated on the stack.

I wouldn't worry about an extra couple of bytes nearly as much as I would worry about changing the behaviour of a function used in an "astronomical number of program", so this looks like a pretty reasonable and conservative choice, at least for an immediate patch.

krick · on Feb 24, 2015

Does it "change behavior"? As far as I understand, the only change is that this test wouldn't crash anymore. Which certainly is "changing the behavior", but allocating more memory is as well. If so, I don't see any reason not to use formally verified version. Not that it is really important, but quite reasonable.

pron · on Feb 24, 2015

They fixed it. The question of whether or not it was the "proper" fix (out of the two proposed solutions) is complicated. The maximum working array length was changed from 40 (ints) to 49. The difference in this case means that even an additional cache line won't be necessary, and if it is -- only for arrays 120K long. This may have been judged to be better than adding more branches to each iteration (and the code lengthened) which will need to run no matter the input size.

My guess is that they ran a JMH benchmark with each proposed fixed, and picked the better one. In either case, I doubt that makes much difference.

bsdetector · on Feb 24, 2015

Which costs more, doubling the number of runLength comparisons or allocating more memory? Without knowing that answer it's hard to say which is the better solution (assuming more memory actually fixes the bug).

marcosdumay · on Feb 24, 2015

Exchanging memory usage for equality comparations is an almost universal no brainer. The number of comparations must increase hundreds of times for compensating doubling the memory usage... and that's just for performance, for energy savings you'll need more.

pron · on Feb 24, 2015

They've added 36 bytes for input sizes > 120K elements. That's not even one cache line. This, instead of a few more branches that would need to run every iteration, no matter the input size. I haven't run a benchmark (they probably did), but my gut says they've made the right choice.

tedunangst · on Feb 24, 2015

Sounds like the amount of memory involved is quite small. Below a certain threshold, doubling is practically free. And IIUC even over-allocating memory doesn't necessarily mean it uses all of the memory, only for the the inputs that cause the bug. The extra comparison however would be performed for all inputs.

rozim · on Feb 24, 2015

Below what threshold though? L1 cache size? O(64 bytes), no?

nostrademons · on Feb 24, 2015

Depends upon the operation of the rest of the program. 64 bytes is the size of a cache line, the actual L1 cache (on recent Intel processors) is 1024 of those cache lines. If the extra memory just means that you pull an extra line into cache for a bookkeeping array, the effect is negligible. If it means that you thrash the cache and have to evict some of your sorted data, the effect could be orders of magnitude.

I'd really hope that the JDK maintainers benchmarked against real-world programs before deciding which fix to use. My intuition is that the set of programs where increasing the memory usage matters is tiny, but the effect is large for them - it's the set of programs where the working set is greater than the L1 cache size. Equality comparisons aren't free either, they can often cause branch mispredicts. But the point is that performance tuning on real-world programs is very counterintuitive, and so the best way to do it is to run some actual numbers on actual programs.

vardump · on Feb 24, 2015

All Intel CPUs worth mentioning in last 10 years have 512 L1 data cache entries, not 1024. Including Haswell and Sandy Bridge.

I do agree with your analysis otherwise.

caf · on Feb 24, 2015

The extra comparison ought to return the same value on all but the inputs that cause the bug too, so it should be perfectly predicted.

maxerickson · on Feb 24, 2015

Memory isn't being doubled though, some bookkeeping that is a fraction of the total memory being used is being increased by less than 2x.

stingraycharles · on Feb 24, 2015

Where do you base these numbers on? If it still fits inside the L1 cache, I assume that increasing the comparisons will have a relatively bigger impact?

marcosdumay · on Feb 24, 2015

Yes, that's right. The thing is that nearly all of the times you want to order arrays that fit inside the L1 cache, neither execution time nor allocation size are a problem.

muyuu · on Feb 24, 2015

Really symptomatic of the way the JDC works.

xamuel · on Feb 24, 2015

[flagged]

fforw · on Feb 24, 2015

It is widely known that every corporate developer has endless time for his work. Mundane aspects like costs of development should have no impact on our code, the pureness of which is a goal in itself.

collyw · on Feb 24, 2015

I thought it was nodejs evangelists that did that.

TeMPOraL · on Feb 24, 2015

Java did that with data. Node.js did it with your file system (NPM, I'm looking at you).