Here is an explanation of the change to `substring()`[0] and why it was done. The change was done in Java 7u6.
In short, the previous way of keeping the same underlying character array and just updating the {offset, count} indexes has a drawback in that if the original string is large, it is prevented from being GC'd if one keeps a reference to even a single substring generated from it.
So, it's a trade-off between the original and new behaviour; the original way more or less caps the memory usage at the size of the original string, but at the expense of not being able to GC it if even a single substring exists, while the new way increases memory usage for each substring generated but does not prevent any of the strings from being GC'd.
This is why the article's code example yields such a huge difference in memory usage in Java 6 vs Java 7; it is effectively a sort of "anti-pattern" when used against the new `substring()` method. (i.e. iterating through a large string and generating lots of sub-strings)
The article I linked to, which came out in late 2012, basically had the same advice:
"If you are writing parsers and such, you can not rely any more on the implicit caching provided by String. You will need to implement a similar mechanism based on buffering and a custom implementation of CharSequence"
The thing I mind is that there is now no way to get the old behavior. String is a final class, so you cannot override it and add a field, even. You can roll your own - if there is no code you do not control that takes a string. (And if you don't mind having to write your own string class!)
And it being done in a "bugfix" release? That's unacceptable.
I agree - echoing the sentiments of another commentator here, I feel like one of the tenets of Java is backwards compatibility. While the change doesn't affect functionality, it can turn code that previously had a space complexity of O(1) into one that is O(n). This is probably a Bad Thing.
Conversely, for people new or somewhat-new to the language, the change probably makes sense from a principle of least surprise. From the start, you're taught that Strings are immutable objects, so you probably understand that `.substring()` produces a new instance object. Not having the original memory freed when you remove all references to the original string would likely be puzzling at first.
In this respect, the Java/Oracle folks likely decided that optimizing for the "parsing/tokenization" use case (where you make lots of substrings from a large original string and thus it makes sense to use the same underlying character array) was more novel and less frequent than the use case of "just pulling a small substring from a much larger one and then discarding the large one."
> You can roll your own - if there is no code you do not control that takes a string.
You can roll your own for the standard String too, through the bootstrap class loader.
Although I don't know how many assumptions about the internals of the string class are baked into the JVM. But I think you could replace substring() relatively safely.
Given the pain that would be associated with rolling your own, why not make the case that a new method be added to the String interface that provides the old behavior? Legacy applications still need to change, but it would be a relatively straight-forward mechanical replacment.
That requires storing an extra two fields per String (int length, offset;), which is costly. Users who need constant-time substrings can simply implement their own class `Subsequence extends CharSequence` with a constructor taking a CharSequence and two ints. Users who need to pass a substring to a foreign function which only accepts String do need to copy, but that's not a major enough use-case to justify upping the memory usage of most applications.
In short, the previous way of keeping the same underlying character array and just updating the {offset, count} indexes has a drawback in that if the original string is large, it is prevented from being GC'd if one keeps a reference to even a single substring generated from it.
So, it's a trade-off between the original and new behaviour; the original way more or less caps the memory usage at the size of the original string, but at the expense of not being able to GC it if even a single substring exists, while the new way increases memory usage for each substring generated but does not prevent any of the strings from being GC'd.
This is why the article's code example yields such a huge difference in memory usage in Java 6 vs Java 7; it is effectively a sort of "anti-pattern" when used against the new `substring()` method. (i.e. iterating through a large string and generating lots of sub-strings)
The article I linked to, which came out in late 2012, basically had the same advice:
"If you are writing parsers and such, you can not rely any more on the implicit caching provided by String. You will need to implement a similar mechanism based on buffering and a custom implementation of CharSequence"
0. http://www.javaadvent.com/2012/12/changes-to-stringsubstring...