It seems to me that the issue is that $i++ is nonsensical if $i is a string (and $i is a string, not a character). But instead of raising an error, PHP soldiers on and tries to apply some completely unexpected function.
It's actually occasionally useful if you're trying to gensym a string. In a normal language, you need to write your own gensym function - in PHP, you know that $s++ will create a sequence of unique strings, suitable for use in b64-only environments, that will never run out.
Perl does this, too. And is it really that unexpected? How many possible things could 'z'+1 return? The only possibilities that make sense to me are "{", "aa", or an error. I like PHP because it's flexible, and this often makes life easier. If you're really going to try to use strings as integers, then you deserve what's coming to you. If you're going to use this behavior as a feature, then you've just saved yourself a chunk of time.
More sensible would be to not overload the ++ or +1 operator to do this but provide a different function. Then the capability is there but something unexpected doesn't happen in the presence of type errors.
i would certainly expect it to go through the charset in order instead of jumping back to a, but if you think of a string as a base 256 (or 26 in this case) number then it makes sense
Well, the choice was between having 'z'++ go to 'aa' vs. '{'. In most cases when you are comparing strings you want to do so alphabetically, not by their underlying character code. As mentioned in the thread there, Perl made the same decision. If you want to compare by character code, use ord() on it.
Writing a loop like that is also a bit of a weird edge-case. You would typically just call range('a','z') to get an array of a through z.
Please explain: what makes this choice "correct?" Is it because it's your personal preference?
Different languages have different features. I've been coding PHP for years, and I've never encountered this feature before. I suspect that's true because I don't usually treat strings like integers. However, now that I know it's possible, I might actually use it to generate random strings or do something useful. I don't see it as something that is incorrect, I see it as how the language is implemented.
$x++ doesn't overflow (repeat a value you've already seen), yet produces a value less than the prior $x. This means PHP has two different total orderings over strings, one of which is inaccessible except by brute force. I find that indefensibly wrong. The odds I wanted this are zero, so silently getting it anyway is the kind of ridiculous misbehavior that makes me glad I can avoid the language.
> The odds I wanted this are zero, so silently getting it anyway is the kind of ridiculous misbehavior that makes me glad I can avoid the language.
Not all useful values have to be order-able. If you just want to generate dictionary keys or unique IDs, you don't care about the ordering property of the values at all. And if you do care, all that matters is that ordering a given set of values is consistent. This behavior satisfies that criteria.
There are two things to know. 'aa' < 'z' , which I think is expected. And 'z'+1 == 'aa', which is debatable, but not far-fetched.
As far language blunders go, I don't see this as unforgivable, especially since it only appears when you're treating a string as an integer (a dumb idea, anyway) from within the for construct.
Giving this further thought, I think languages with mutable strings can overload the increment and decrement operators as a way to resize buffers, or move the fill pointer. In C++ this would actually make perfect sense, for implementing high-performance strings. The ++ can be a synonym for realloc, and the base pointer to the string, along with its length and encoding can be stored in an opaque "struct string" handle. Fast, secure pascal-style strings.]
Indeed, but I said "for implementing high-performance strings."
I would love to hear about the OP's ideas for faster variable-length and adjustable string implementation using C++ member or friend functions (the C++ implementation of "methods".)
When it comes to performance, the same design I detailed is used by all the "high-per" libraries that I know of. If we're talking API aesthetics, fast implementations can always be wrapped with something more attractive.
P.S. Let me make it clear that I am not advocating operator overloading for resizing strings and that the OP actually does have a point. (sorry if I came off as a douche in that regard.) But implementing fast-strings as classes + methods is not really fastest way in C++.
I'm getting the impression you don't know C++ that well. A "member" function or "method" in C++ is just a function that has access to other members of its class. If you don't add the virtual keyword there doesn't have to be any difference between it and a normal function (besides the mentioned scoping rules) and it could even be inlined. Overloaded operators aren't any faster than functions.
PHP is like that. Few other languages would feel comfortable with simultaneously defining 'z'+1 as 'aa', while asserting that 'aa' is strictly less than 'z'.
It's not that 'aa' would make any sense being greater than 'z', it's just the non-intuitive result that comes out of the combination of those features.
To my mind it's a good example of how languages that try to throw in every feature and the kitchen sink end up acquiring a lot of bizarre warts from unforeseen interactions.
Note that there is no such thing as "the alphabetical ordering". Different languages define different collations, and some even multiple ones (e.g., German collation vs German phone book collation, or the various collation systems for Chinese characters). I'm pretty sure PHP's comparison operator will define non-ASCII characters as being outside of the alphabet, and probably just fail on multi-byte strings (UTF-8).
So if you are doing comparisons on strings, you probably either have an i18n bug or a really, really specific use case.
Clearly you have not ventured too far outside the "safe" en-US cultures ;)
In the Scandinavian languages (Danish, Swedish and Norwegian) "aa" is (for whatever historical reasons) considered a synonym for "å", which is that last letter of the alphabet. In the same way "ae" maps "æ" (third last) and "oe" maps to "ø" (second last).
Hence, using culture-aware sorting, the following array may actually not be sorted: [ "a", "aa", "ae", "b", "oe", "of" ]. You will find similar sort behaviour in a lot of databases when you set the database/table/column collation to other cultures.
While I agree the PHP implementation is silly, your blanket dismissal of the possibility that 'aa' can't be greater than 'z' is equally ignorant.
A couple of years ago I had to devise PHP sort algorithms for Danish, German, and Russian. (Unsurprisingly, native speakers are essential for testing this kind of thing ;) )
It was surprising at the time that these sort algorithms were not generally available in code. We also found that we couldn't rely on the db (MySql) to do the right thing, which was not helped, of course, by having these character sets, and others, in the db. The db was fully utf8.
Once you grok PHP's underlying method, it's simply a case of using the above mentioned technique -- replacing the "foreign" symbols with character pairs -- to establish the correct order.
In the Danish case above, for example, I used zx, zy, and zz.
Not really, after $p = $z++ $p is 'z', $z is 'aa', as you'd kind-of expect. But the difference between literals and expressions/variable references of course just adds icing to the cake of bizareness.
exhibits the exact same problem, which all C's integral types have. So unless you embrace Bignums Everywhere (or at least automatic promotion to bignum and therefore Potential Bignums Everywhere Except Where The Compiler Can Prove Otherwise) it's hard to avoid.
Eh, I happen to think the Perl behavior is just as ridiculous. Defining the addition operator on a string and an integer to create some kind of complete lexographical ordering is completely counterintuitive and, moreover, is probably not useful enough to warrant such behavior.
This actually makes perfect sense to me. Knowing this would have really come in handy a while back when I was creating my dynamic excel spreadsheet generator in PHP. While it doesn't follow ASCII char indexes, it follows a base 26 numbering system.
Edit: Oh... and if anyone was wondering, PHP's "char" function (provided an index) should perform similar to incrementing an ASCII char in C/C++.