Hacker News new | past | comments | ask | show | jobs | submit login
Why doesn't this code simply return a to z as the output? (stackoverflow.com)
59 points by hardik988 on Nov 5, 2010 | hide | past | web | favorite | 46 comments

It seems to me that the issue is that $i++ is nonsensical if $i is a string (and $i is a string, not a character). But instead of raising an error, PHP soldiers on and tries to apply some completely unexpected function.

It's actually occasionally useful if you're trying to gensym a string. In a normal language, you need to write your own gensym function - in PHP, you know that $s++ will create a sequence of unique strings, suitable for use in b64-only environments, that will never run out.

Perl does this, too. And is it really that unexpected? How many possible things could 'z'+1 return? The only possibilities that make sense to me are "{", "aa", or an error. I like PHP because it's flexible, and this often makes life easier. If you're really going to try to use strings as integers, then you deserve what's coming to you. If you're going to use this behavior as a feature, then you've just saved yourself a chunk of time.

>How many possible things could 'z'+1 return?

That are less than or equal to z?

To z or to 'z'?


More sensible would be to not overload the ++ or +1 operator to do this but provide a different function. Then the capability is there but something unexpected doesn't happen in the presence of type errors.

i would certainly expect it to go through the charset in order instead of jumping back to a, but if you think of a string as a base 256 (or 26 in this case) number then it makes sense

Well, the choice was between having 'z'++ go to 'aa' vs. '{'. In most cases when you are comparing strings you want to do so alphabetically, not by their underlying character code. As mentioned in the thread there, Perl made the same decision. If you want to compare by character code, use ord() on it.

Writing a loop like that is also a bit of a weird edge-case. You would typically just call range('a','z') to get an array of a through z.

There is a third, correct choice: don't allow incrementing of strings. Everything you say about comparing is true, but that's a separate issue.

Please explain: what makes this choice "correct?" Is it because it's your personal preference?

Different languages have different features. I've been coding PHP for years, and I've never encountered this feature before. I suspect that's true because I don't usually treat strings like integers. However, now that I know it's possible, I might actually use it to generate random strings or do something useful. I don't see it as something that is incorrect, I see it as how the language is implemented.

$x++ doesn't overflow (repeat a value you've already seen), yet produces a value less than the prior $x. This means PHP has two different total orderings over strings, one of which is inaccessible except by brute force. I find that indefensibly wrong. The odds I wanted this are zero, so silently getting it anyway is the kind of ridiculous misbehavior that makes me glad I can avoid the language.

> The odds I wanted this are zero, so silently getting it anyway is the kind of ridiculous misbehavior that makes me glad I can avoid the language.

Not all useful values have to be order-able. If you just want to generate dictionary keys or unique IDs, you don't care about the ordering property of the values at all. And if you do care, all that matters is that ordering a given set of values is consistent. This behavior satisfies that criteria.

There are two things to know. 'aa' < 'z' , which I think is expected. And 'z'+1 == 'aa', which is debatable, but not far-fetched.

As far language blunders go, I don't see this as unforgivable, especially since it only appears when you're treating a string as an integer (a dumb idea, anyway) from within the for construct.



Giving this further thought, I think languages with mutable strings can overload the increment and decrement operators as a way to resize buffers, or move the fill pointer. In C++ this would actually make perfect sense, for implementing high-performance strings. The ++ can be a synonym for realloc, and the base pointer to the string, along with its length and encoding can be stored in an opaque "struct string" handle. Fast, secure pascal-style strings.]

This sounds awful and unintuitive. Stick with methods, please.

Huh? C++ doesn't have "methods".

What language are you really thinking of?

"Method" is OO parlance for a function that is associated with a class or object. C++ has those.

Indeed, but I said "for implementing high-performance strings."

I would love to hear about the OP's ideas for faster variable-length and adjustable string implementation using C++ member or friend functions (the C++ implementation of "methods".)

When it comes to performance, the same design I detailed is used by all the "high-per" libraries that I know of. If we're talking API aesthetics, fast implementations can always be wrapped with something more attractive.

P.S. Let me make it clear that I am not advocating operator overloading for resizing strings and that the OP actually does have a point. (sorry if I came off as a douche in that regard.) But implementing fast-strings as classes + methods is not really fastest way in C++.

I'm getting the impression you don't know C++ that well. A "member" function or "method" in C++ is just a function that has access to other members of its class. If you don't add the virtual keyword there doesn't have to be any difference between it and a normal function (besides the mentioned scoping rules) and it could even be inlined. Overloaded operators aren't any faster than functions.

Why does it even execute z++? Shouldn't it stop at 'z' since that is set as the end condition of the for loop?

'z' + 1 is 'aa' which is not less than 'z' lexicographically, so the end condition is NOT met and the loop continues

The for loop runs while $i is less than or equal to 'z'.

Also, one might wonder why it stops at 'yz' instead of continuing on forever.

There is no logic in this insanity — you won't know until you try it out.

I think it's because the next value, 'za', is neither equal to 'z' nor "less" than it, so the condition fails, and the loop ends.

That is ridiculous.

PHP is like that. Few other languages would feel comfortable with simultaneously defining 'z'+1 as 'aa', while asserting that 'aa' is strictly less than 'z'.

While I agree that 'z'++ being 'aa' is a bit silly, I can't ever see anyone making the case that 'aa' should be greater than 'z'.

It's not that 'aa' would make any sense being greater than 'z', it's just the non-intuitive result that comes out of the combination of those features.

To my mind it's a good example of how languages that try to throw in every feature and the kitchen sink end up acquiring a lot of bizarre warts from unforeseen interactions.

As long as we're enforcing ordering on casually unordered sets, when not define it as length of the string? Now 'aa' is certainly greater than 'z'.

There's a pretty good case for making the comparison closer to alphabetical ordering. 'apple' before 'pear' even though 'apple' is the longer string.

'z' before 'zz' also follows alphabetical ordering.

Note that there is no such thing as "the alphabetical ordering". Different languages define different collations, and some even multiple ones (e.g., German collation vs German phone book collation, or the various collation systems for Chinese characters). I'm pretty sure PHP's comparison operator will define non-ASCII characters as being outside of the alphabet, and probably just fail on multi-byte strings (UTF-8).

So if you are doing comparisons on strings, you probably either have an i18n bug or a really, really specific use case.

That's what I was thinking, and in this case 'zz' would be > 'a'. I'd say that's a reasonable choice.

Clearly you have not ventured too far outside the "safe" en-US cultures ;)

In the Scandinavian languages (Danish, Swedish and Norwegian) "aa" is (for whatever historical reasons) considered a synonym for "å", which is that last letter of the alphabet. In the same way "ae" maps "æ" (third last) and "oe" maps to "ø" (second last).

Hence, using culture-aware sorting, the following array may actually not be sorted: [ "a", "aa", "ae", "b", "oe", "of" ]. You will find similar sort behaviour in a lot of databases when you set the database/table/column collation to other cultures.

While I agree the PHP implementation is silly, your blanket dismissal of the possibility that 'aa' can't be greater than 'z' is equally ignorant.

A couple of years ago I had to devise PHP sort algorithms for Danish, German, and Russian. (Unsurprisingly, native speakers are essential for testing this kind of thing ;) )

It was surprising at the time that these sort algorithms were not generally available in code. We also found that we couldn't rely on the db (MySql) to do the right thing, which was not helped, of course, by having these character sets, and others, in the db. The db was fully utf8.

Once you grok PHP's underlying method, it's simply a case of using the above mentioned technique -- replacing the "foreign" symbols with character pairs -- to establish the correct order.

In the Danish case above, for example, I used zx, zy, and zz.

'z'++ is not 'aa'. 'z'++ is not even valid PHP. $z='z';$p=$z++; does nothing either.

it's only in the for-loop context.

Not really, after $p = $z++ $p is 'z', $z is 'aa', as you'd kind-of expect. But the difference between literals and expressions/variable references of course just adds icing to the cake of bizareness.


<?php $z='z'; echo ++$z;

which produces 'aa'

Yeah, it's an absolutely ridiculous combination. On the other hand ...

  unsigned char c;
  for (c=0; c<=255; ++c) {
    printf(" %d", c);
exhibits the exact same problem, which all C's integral types have. So unless you embrace Bignums Everywhere (or at least automatic promotion to bignum and therefore Potential Bignums Everywhere Except Where The Compiler Can Prove Otherwise) it's hard to avoid.

That's fascinating. I wonder what was the reason behind putting such a reasoning on string manipulation.

Because it isn't like C in that aspect but more like Perl?

Eh, I happen to think the Perl behavior is just as ridiculous. Defining the addition operator on a string and an integer to create some kind of complete lexographical ordering is completely counterintuitive and, moreover, is probably not useful enough to warrant such behavior.

This actually makes perfect sense to me. Knowing this would have really come in handy a while back when I was creating my dynamic excel spreadsheet generator in PHP. While it doesn't follow ASCII char indexes, it follows a base 26 numbering system.

Edit: Oh... and if anyone was wondering, PHP's "char" function (provided an index) should perform similar to incrementing an ASCII char in C/C++.

Note: chr not char

people talk about RoR, Django, Haskell++, Node.js, you name it. me? i'm sticking with PHP. i'm gonna be damn good at this language eventually.

PHP can remain insane longer than you can remain alive.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact