Hacker News new | past | comments | ask | show | jobs | submit login

From the article: The reaction of the Java developer community to our report is somewhat disappointing: instead of using our fixed (and verified!) version of mergeCollapse(), they opted to increase the allocated runLen “sufficiently”. As we showed, this is not necessary. In consequence, whoever uses java.utils.Collection.sort() is forced to over allocate space. Given the astronomical number of program runs that such a central routine is used in, this leads to a considerable waste of energy.



It looks like the Python version that's good for 2^49 items uses 1360 bytes for this array, allocated on the stack.

I wouldn't worry about an extra couple of bytes nearly as much as I would worry about changing the behaviour of a function used in an "astronomical number of program", so this looks like a pretty reasonable and conservative choice, at least for an immediate patch.


Does it "change behavior"? As far as I understand, the only change is that this test wouldn't crash anymore. Which certainly is "changing the behavior", but allocating more memory is as well. If so, I don't see any reason not to use formally verified version. Not that it is really important, but quite reasonable.


They fixed it. The question of whether or not it was the "proper" fix (out of the two proposed solutions) is complicated. The maximum working array length was changed from 40 (ints) to 49. The difference in this case means that even an additional cache line won't be necessary, and if it is -- only for arrays 120K long. This may have been judged to be better than adding more branches to each iteration (and the code lengthened) which will need to run no matter the input size.

My guess is that they ran a JMH benchmark with each proposed fixed, and picked the better one. In either case, I doubt that makes much difference.


Which costs more, doubling the number of runLength comparisons or allocating more memory? Without knowing that answer it's hard to say which is the better solution (assuming more memory actually fixes the bug).


Exchanging memory usage for equality comparations is an almost universal no brainer. The number of comparations must increase hundreds of times for compensating doubling the memory usage... and that's just for performance, for energy savings you'll need more.


They've added 36 bytes for input sizes > 120K elements. That's not even one cache line. This, instead of a few more branches that would need to run every iteration, no matter the input size. I haven't run a benchmark (they probably did), but my gut says they've made the right choice.


Sounds like the amount of memory involved is quite small. Below a certain threshold, doubling is practically free. And IIUC even over-allocating memory doesn't necessarily mean it uses all of the memory, only for the the inputs that cause the bug. The extra comparison however would be performed for all inputs.


Below what threshold though? L1 cache size? O(64 bytes), no?


Depends upon the operation of the rest of the program. 64 bytes is the size of a cache line, the actual L1 cache (on recent Intel processors) is 1024 of those cache lines. If the extra memory just means that you pull an extra line into cache for a bookkeeping array, the effect is negligible. If it means that you thrash the cache and have to evict some of your sorted data, the effect could be orders of magnitude.

I'd really hope that the JDK maintainers benchmarked against real-world programs before deciding which fix to use. My intuition is that the set of programs where increasing the memory usage matters is tiny, but the effect is large for them - it's the set of programs where the working set is greater than the L1 cache size. Equality comparisons aren't free either, they can often cause branch mispredicts. But the point is that performance tuning on real-world programs is very counterintuitive, and so the best way to do it is to run some actual numbers on actual programs.


All Intel CPUs worth mentioning in last 10 years have 512 L1 data cache entries, not 1024. Including Haswell and Sandy Bridge.

I do agree with your analysis otherwise.


The extra comparison ought to return the same value on all but the inputs that cause the bug too, so it should be perfectly predicted.


Memory isn't being doubled though, some bookkeeping that is a fraction of the total memory being used is being increased by less than 2x.


Where do you base these numbers on? If it still fits inside the L1 cache, I assume that increasing the comparisons will have a relatively bigger impact?


Yes, that's right. The thing is that nearly all of the times you want to order arrays that fit inside the L1 cache, neither execution time nor allocation size are a problem.


Really symptomatic of the way the JDC works.


[flagged]


It is widely known that every corporate developer has endless time for his work. Mundane aspects like costs of development should have no impact on our code, the pureness of which is a goal in itself.


I thought it was nodejs evangelists that did that.


Java did that with data. Node.js did it with your file system (NPM, I'm looking at you).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: