

PHP Bug #18556 : Setting locale to 'tr_TR' lowercases class names - pixdamix
https://bugs.php.net/bug.php?id=18556

======
Mithrandir
I think this was a good explanation:

"No, the problem results because lowercase i (in most languages) and uppercase
I (in most languages) are not actually considered to be the upper/lower
variant of the same letter in Turkish. In Turkish, the undotted ı is the
lowercase of I, and the dotted İ is the uppercase of i. If you have a class
named Image, it will break if the locale is changed to turkish because
class_exists() function uses zend_str_tolower(), and changes the case on all
classes, because they are supposed to be case insensitive. Someone else above
explained it very well:

"class_exists() function uses zend_str_tolower(). zend_str_tolower() uses
zend_tolower(). zend_tolower() uses _tolower_l() on Windows and tolower() on
other oses. _tolower_l() is not locale aware. tolower() is LC_CTYPE aware."

Edit: Someone else later said the following (I'm wondering if it's true):

"This, practically, can't be fixed. Mainly because there's no way to know if
'I' is uppercase of 'i' or 'ı' since there's not a separate place for Turkish
'I' in code tables. The same holds for 'i' (can't be known if it's lowercase
of 'I' or 'İ'). I told 2 years ago and will say it again: PHP should provide a
way to turn off case-insensitive function/class name lookup. No good
programmer uses this Basic language feature since identifiers are case-
sensitive in all real languages like Python, Ruby, C#, Java."

~~~
simias
But, why should the locale change the way PHP code is interpreted? Shouldn't
LC_ALL="C" when parsing the code?

Maybe it breaks if you embed unicode strings or something. What do other
languages do?

~~~
shuzchen
If it wasn't clear by the comments on the bug report or by the quoted sections
of this comment's parent, let me rephrase it. This issue is entirely caused by
the fact that PHP is case insensitive for classes and function names (but not
variables, go figure). That is, if you define a class MyClass, you can
instantiate it using MyClass or myclass or MYCLASS. You can call the functions
from the standard library in whatever case either (so, array_map or ARRAY_MAP
is fine).

Based on the behavior of this bug, it appears that the way PHP handles this
case insensitivity is that it just lowercases all class and function names
before resolving them. And this bug in particular shows up for Turkish because
'i' is not the lowercase equivalent of 'I'.

Pretty much all other modern languages are case sensitive, so I'd be surprised
to find this issue elsewhere.

~~~
pieter
That still doesn't make sense. If 'i' is not the lowercase equivalent of 'I',
then the lowercasing should just result in another letter, right? The only
thing that could cause the bug is if it uses two different ways of lowercasing
(perhaps one when registering the class, and another way when looking up the
class).

The mapping between uppercase and lowercase can be completely arbitrary, and
as long as it's used consistently you shouldn't get these kind of bugs.

~~~
bnr
The issue only occurs when the locale is changed between registering and
looking up the class.

~~~
pieter
Doesn't look like that from the bug report; there the locale is set first,
then the class is defined and then looked up.

~~~
bnr
PHP registers classes (and functions) at parse time, not at execution time.

i.e. this will print "bar":

    
    
        <?php
        echo foo();
        function foo() { return "bar"; }

------
dools
If there are so many people depending on PHP and all the code written in PHP
in all of Turkey, why doesnt someone in Turkey fix the problem?

Or anywhere for that matter?

There is no "they" in this equation. There is no person who should be held
more accountable than you or I for fixing this problem.

The choices are simple:

1) Fix the problem

2) Find a work around

3) Don't use PHP

What's that? There is a lot of open source software that you wanted to use for
free that's written in PHP that does just what you need except for this tiny
little trivial thing that should be easy to fix? Well too bad!

Trade off the cost of fixing it against the cost of rewriting the big, free,
open source package that's written in PHP you wanted to use, in the
programming language of your choice, and stop complaining.

~~~
simias
It might not be an easy fix if you're not familiar with PHP's guts. It's the
kind of fix that can induce a lot of unexpected regressions.

Not wanting to fix a bug because it's not worth the time or risks breaking
backward compatibility is perfectly fine by me. But at least take a decision
and say something.

If they don't plan on fixing it they should say something like "We believe
this is a minor bug that only concerns a small number of users. In order to
fix this we'd need to change X, Y and Z and make sure we don't introduce
regressions. If you want to try and do it we'll be glad to review your
patches. In the meantime you can use this workaround: [...]".

I hate it when I submit a bug report and it's being ignored. You also build a
strawman argument with the "lot of open source software that you wanted to use
for free". It's a bug and should be fixed (even if the fix is closing the
ticket as "wontfix").

~~~
dools
The "strawman" you're referring to was in response to this:

<http://news.ycombinator.com/item?id=4188167>

specifically the complaint that this problem manifests with lots of off-the-
shelf software (although I suppose he didn't specify FOSS in his original
comment).

However, I wasn't directly responding to that guy, I was more responding to
what I feel has been aptly described as a "witch hunt" by others on this page.

 _It might not be an easy fix if you're not familiar with PHP's guts_

10 years is a long time for someone to have the chance to get familiar with
it.

Even if you assume that for 8 years, everyone was saying "oh, it will get
fixed some time" even 2 years is a long time for anyone affected by this
problem seriously enough to become familiar enough with PHP to fix the problem
_if that's the path that will produce the most value for them_ (ie. if there's
enough value in some existing codebase or off the shelf software to warrant
fixing this if there's truly no other workaround).

Still, I can see the sense in promoting major issues like this with PHP, but
posting the bug report on the front page of HN is far less useful than, say,
writing a blog post about it with some case studies of where the problem has
been manifest, how people have dealt with it, the history of the bug, etc.

Actually that's a good blog post, might put it on my list ;)

------
alpb
This is a huge bug. Believe or not, many dev people in Turkey use locale tr_TR
(which is perfectly normal) and when they begin to use "any" off-the-shelf PHP
library/class with uppercase-I, it does not work at all. A little example, if
APC has a class with I, it won't work on your tr_TR configured Windows Server.

PHP is crap. Not even classical ASP had such bugs and it was perfectly passing
the Turkey test ([http://www.codinghorror.com/blog/2008/03/whats-wrong-with-
tu...](http://www.codinghorror.com/blog/2008/03/whats-wrong-with-turkey.html))
and Unicode supporting languages didn't have such a bug. E.g. Java, Python.

PHP is crap. This bug is clearly a WONTFIX, it's been 10 years since it is
reported. I remember this bug when I was 14, thank God I moved on to other
languages afterwards.

~~~
kalleboo
If this is such a dealbreaker for developers in Turkey, why have none of them,
in the 10 years this bug has been alive, submitted a patch for it? PHP is open
source, it relies on code submissions.

edit: not trolling, just curious. What drives people to complain about
specific, well-defined open source bugs without any effort to fix it? I
understand hard-to-nail down issues like user experience, but this shouldn't
be that hard to plan out and fix independently.

~~~
slurgfest
If there are viable alternative projects which never had that problem, you can
save all the time rather than trying to salvage someone else's broken
software. It's a lot less time and trouble. What reason do I even have for
fixing your project? I don't owe PHP loyalty when it is broken for me.

If there are viable alternative projects which are more responsive to bug
reports, that is more promising for the future - if a second bug I see is
reasonably likely to be fixed in the future, I can feel more confident basing
my own code on it.

People spend months and years writing apps on top of things like PHP and once
they have the code they don't necessarily have a lot of choice. At that point
maybe you fix the bugs in your dependencies rather than rewrite your own app.
But when you have a choice, you don't adopt a tool which is going to leave you
with this much technical liability.

This is offered peacefully in an attempt to explain the question which seems
to confuse you.

------
mikeash
Every time an article critical of PHP appears, defenders come out of the
woodwork. It's a great language, they say. It's no more flawed than any other
language. Critics are just biased. It has problems, but other languages have
problems too. People build large apps with PHP, so it must be good.

But come on. This language is complete crap. Code spontaneously fails
depending on the locale? And the bug has been open for _ten years_ and still
is not fixed? And this is only one bizarre and inexplicable bug out of
hundreds, maybe thousands, of bizarre and inexplicable bugs in PHP.

This language isn't defensible. If you want to say that it's worth dealing
with the flaws due to the ecosystem, fine, fair enough. But don't tell us that
PHP is no worse than any other language. It's _far_ worse.

~~~
lonnyk
> Code spontaneously fails depending on the locale?

It doesn't spontaneously fail. The languages functions are case-insensitive
and they documented this. [1] [2] When you change the locale to Turkish the
letters change. Thus, the class name changes and no longer works as expected.

So it is documented because it may not as expected, but it is not spontaneous.

[1] <http://www.php.net/manual/en/functions.user-defined.php>

[2] <https://gist.github.com/3033533>

~~~
viraptor
That's incorrect. It's not behaving as documented. Whether you compare in a
case-sensitive or case-insensitive way "Info" should always match "Info". The
bug results in a situation where it doesn't.

I'd accept that you cannot reference class "info" using name "Info" in Turkish
locale, but that's not the case here.

------
yuvadam
It's kind of hard _not_ to bash PHP for crap like this.

Yes, the PHP ecosystem is friendly, easy, cheap, etc. etc.

But as a programming language _per se_... Come on, PHP.

------
j_col
PHP is a big legacy open source project, worked on by many volunteers whenever
they can spare the time, just like any other open source project.

It is wildly successful despite this and many other bugs.

I only wish that the people who spend as much time attacking PHP and it's
developers endlessly would instead focus some of that energy into helping to
improve PHP, but I guess some of us are just negatively charged.

Sad that we have yet another anti-PHP posting hitting the front of HN in as
many days, let the hating re-commence (again)...

~~~
samdk
Responses like this to people who don't like PHP are just as bad as the people
constantly and loudly attacking it. Neither accomplishes anything other than
building animosity.

Your suggestion that people improve PHP instead of attacking it is naive. PHP
is, as you said, a big legacy open source project. As a result of that, it's
basically impossible to make the extreme, breaking changes that many people
(me included) think would be required to make it a reasonable competitor to
the existing options. (And the PHP community is not especially inclined to
change. It took years for _short array syntax_ to get added to the language.
If something as obviously beneficial as that is going to be hotly debated,
making real, breaking changes is impossible.)

Faced with the alternatives of trying to radically change PHP (which is, as I
said above, impossible) or to use and improve other languages and frameworks,
I think the choice is obvious. It was one thing 5-10 years ago when there
weren't necessarily good or mature alternatives, but we have many choices now.
In my opinion, it makes very little sense to use something with as much
extraordinarly painful legacy baggage as PHP unless you have an exceptionally
good reason for doing so.

~~~
codinghorror
After 5+ years of eloquent, smart programmers* posting long, well researched
screeds about what's deeply broken with PHP's design at the most fundamental
levels, there is no other conclusion to be reached. The only way to improve
PHP is to replace it, and we have a long way to go to get there.

* Note that I am _not_ including myself in this list. But any trivial search for "what's wrong with PHP", much less "PHP sucks" produces 5+ years of very bright, articulate, sometimes downright famous programmers making this same point about PHP. Most recently Jamie Zawinski at [http://www.jwz.org/blog/2011/05/computational-feces/#comment...](http://www.jwz.org/blog/2011/05/computational-feces/#comment-90634)

~~~
j_col
> After 5+ years of eloquent, smart programmers* posting long, well researched
> screeds about what's deeply broken with PHP's design at the most fundamental
> levels, there is no other conclusion to be reached.

Issuing holy decrees from their ivory towers more like. Meanwhile, lots of
tremendously successful companies doing real work in PHP each and every day,
at the coal face, where it matters. Does their hard work deserve this constant
ridicule?

> The only way to improve PHP is to replace it, and we have a long way to go
> to get there.

Agreed. When something better comes along, I'll start using it (like how I
switched from Perl to PHP a long time ago). The problem is, many "eloquent,
smart programmers" are too busy "posting long, well researched screeds" to
spend some time making PHP better (or making a better PHP).

~~~
Tloewald
It seems to me that the kicker is the obviously superior alternatives, such as
python, ruby, and JavaScript, don't offer the ease of deployment that php has
via mod_php, and instead insist on the developer writing a web server. While
there are advantages (performance, control) to the web server approach it is
clear that there are advantages (simplicty) to being able to stick code
snippets in web pages.

If you could deploy (say) python-decorated web pages via apache (on el cheapo
hosting services) versus write yor own server and figure out how to host it
then the problem would be solved. We have the languages, just not the
ecosystems.

Obviously I'm not the first to observe this. Mod_python exists, it's just not
popular.

~~~
secoif
I don't know any rails developer who has written a web server, where did you
get that idea? Many use mod_rails via nginx or apache:

<http://www.modrails.com/>

Also, PAAS offerings like Heroku and Engine Yard make deploying sophisticated
rails environments far more convenient than their PHP equivalent.

> "sticking code snippets in web pages"

You can do exactly this in ruby with ERB, but many shy away from this approach
due to a distaste for the bolognaise pattern.

You should try ruby/rails/sinatra on your next project, it will change your
life.

~~~
Tloewald
I've tried rails, years ago. Don't like its fundamental design. My life
remains unchanged.

BTW how do you think rails serves web pages?

------
gokhan
That's why, for example, .NET world has .ToLowerInvariant() and
.ToUpperInvariant() and developers are advised to use it when doing internal
stuff. Interpreting / parsing a language is clearly an internal task and
shouldn't be affected by locale changes.

~~~
Draiken
Unfortunately you can't compare .NET to PHP. Ever.

~~~
billpg
They just did.

------
fmavituna
It's also referred as The Turkey Test :
[http://www.moserware.com/2008/02/does-your-code-pass-
turkey-...](http://www.moserware.com/2008/02/does-your-code-pass-turkey-
test.html)

[http://www.codinghorror.com/blog/2008/03/whats-wrong-with-
tu...](http://www.codinghorror.com/blog/2008/03/whats-wrong-with-turkey.html)

------
celalo
I is not capital of i in Turkish. Instead, İ is capital of i and I is capital
of ı. They are two different letters.

~~~
Raticide
No other language has this problem. The locale is irrelevant. The class name
is just a series of bytes; it shouldn't need to transform the case.

~~~
pilif
True for languages which are case-sensitive.

Other languages like PHP (partially), BASIC or Pascal are case insensitive, so
lookup has to be done case-insensitively which means that case has to be
normalized, so transforming case of identifier becomes necessary. If it can't
be done consistently, that's a problem.

~~~
klmr
So you use a culture-invariant locale for parsing. Still not a problem.

~~~
pilif
How do you lowercase in a culture-invariant locale? Or do you mean english?

~~~
mikeash
No matter how you do it, a class name should always match when looked up with
an _identical string_.

------
patio11
The situation is not helped by the frequent OSS community suggestion: "Just
patch Turkish." while mumbling "Bloody non-ASCII ingrates."

------
robryan
What was the advantage of case insensitive class and function names? Sounds to
me like someone that was implementing very early on without great reasons and
them kept for backwards comparability. In all my programming in PHP I have
never thought to take advantage of this.

~~~
RobAley
I'm assuming the original reason is lost in the mists of time, but one
advantage it has when calling/using external/3rd party code is in style
conventions. If in my code my convention is to use functionNames but in yours
you use functionames or FunctionNames, I can still code in my style after
include()ing your file. A small advantage, granted.

------
viraptor
I don't understand what's the problem with fixing this really. I would
completely agree that making "Info" and "info" class names compatible is "not
fixable", but what is the problem in making "Info" work if both the definition
and usage are the same case? The bug says that this is exactly backwards -
mixed case works, but same case doesn't.

The only way to make it not work is to first change the case in one locale and
then case-insensitive compare it in another locale. Why would this kind of
operation ever happen? Any sane situation should "just work":

\- in declaration convert to lower-case and save, in usage convert to lower-
case and lookup -> has to work

\- in declaration save original, in usage search all classes with case-
insensitive compare -> has to work

How was that bug ever created in the first place? I get the fact that "I"
doesn't match to lower-case "i" in tr_TR, but why does it matter when
comparing strings which should be equal? Just be consistent in how both the
declarations and usages are converted...

------
robryan
It is likely no one who regularly commits to PHP is effected by this issue and
it requires non trivial changes to the way PHP works to fix.

Granted given the usual pragmatism of PHP someone should have just hacked
something in by now.

------
cocoflunchy
I'm looking forward to the tenth birthday of this bug... Only 24 days to go !

------
dkhenry
This is my biggest problem with PHP. Aside from poor language construction ,
and the plethora of poorly written code the core language has lots of problems
in it. When upgrading to PHP 5.4.3 I found six or seven show stopper bugs in
PHP and some of its extensions ( one of which has never worked ). I am still
waiting on the fix to one of them. <https://bugs.php.net/bug.php?id=62302>

------
jister
If this is a known bug and it's been there for 10 years then why the hell did
the developer STILL chose to use PHP in the first place?

~~~
Draiken
Unfortunately on legacy systems someone else chose PHP for him a long time
ago... Poor developer that has to deal with this stuff. Been there.

------
kitsune_
Words fail me.

~~~
dasil003
I know it seems insane, but Turkish capitalization is not fun to work with as
a programmer. When they latinized the alphabet 100 years ago or so, they were
short on vowels and so it must have seemed pretty clever and convenient to
make i and I separate letters with İ and ı respective case pairs. From a
western programmers perspective though it's one of the worst unicode special
cases owing to its combined unexpectedness and commonness.

Just as an example, text-transform: uppercase has been broken in Turkish for
all major browsers until I believe Firefox finally fixed it late last year,
after having a bug open for _nearly a decade_.

~~~
obilgic
Just my curiosity, how do you know that they were short on vowels?

------
devgutt
Bug is a bug is a bug is a bug

------
luminaobscura
wow, it has been 10 years and no fix!

------
ObnoxiousJul
Please stop telling PHP is a crap. News on the topic are damn to high. Think
that PHP coders are beginning to migrate to stuff like python and that most of
them don't want to learn programming, they still want to monkey write program
and that through trial and errors it works. I am on a one of the #python-xx
irc channel, it is an horror.

PHP is cool, it is useful, it is a magnet for bad developers. This way they
don't pollute our ecosystems.

------
4qbomb
PHP is to Ruby(Or whatever your hater flavor is) as Christians is to Muslims.
Neither of them are going to go away until one of them kills all the others.
The more likely alternative is something else coming and destroying them both.

~~~
4qbomb
This is simply proof of the demographic and blind nature of the Hacker News
audience. Pathetic

