Hacker News new | past | comments | ask | show | jobs | submit login

> mistakes for no reason

Don't you think that 65 ≠ 97 is sufficient of a reason?..

I mean, 'A' ≠ 'a', in ASCII, Unicode and even EBCDIC. In computers, those are two distinct characters. This fact won't change no matter how you rationalize your expectations.

Thus, pretending that "y.txt" is the same as "Y.txt" is an elaborate lie. Even acknowledging that it's a "white", well-intentioned lie (designed to preserve the mistaken expectation that "y.txt" is the same as "Y.txt") — I don't like when computers lie to me; do you?

As every lie, this one has weird consequences. One of them is the today's RCE in OP. Another one was CVE-2014-9390. Myriads others.

Linux rejects the whole notion of filename case-insensitivity, and demonstrates how computers actually work. It becomes easier on developers and more secure on users.

Lastly, don't feel that I'm attacking you; I'm opposing an idea. So, here's a tip: you can set up case-insensitive filename completion in bash, so that TAB will correct your casing mistakes for you. It's a simply one-line change involving putting `set completion-ignore-case on` into an inputrc.




> Don't you think that 65 ≠ 97 is sufficient of a reason?.. [...] In computers, those are two distinct characters

In computers yes, but i am a human and to me as a human 'A' and 'a' are the same letter.


Fair enough. But notice: systems tend to expect that humans interacting with them observe basic rules. "The capital/lowercase variants of western alphabet letters are represented each as distinct character" is one such generic, basic rule with computer systems. Especially if we zoom out of FS's into a broader context (http, json, programming languages, etc) — you can't deny it; it's a fact.

We do have the options to ignore the fact and say "What bytes? I don't care. Guess what I mean, and lie to me as well as you can so I can stay happy in my ignorance" — but, see, coordinating good support for that isn't easy. Minor wrinkles in it continue causing burns, sometimes RCEs. Maybe "doing in Rome as Romans do" isn't such a bad advice after all?


Why is this an unacceptable lie but the notion of letters instead of code points is acceptable? Especially one you get into multi-byte characters?


I didn't say it's unacceptable, neither meant that. In many contexts, it'd be tough without case-insensitive regex matching, for example. Reinforcing my point, //i gains issues once applied to the entirety of Unicode.

It's almost comical: people continue insisting on "letters not code points" knowing very well how computers are bad with guesswork and under-defined notions. Issues stemming from that keep coming up. What if, instead, the norm accepted that 'A' ≠ 'a' and stopped creating problems which computers are known to deal poorly with?


Well, then I think computers would be less useful than they are. The machines serve us, not the other way around.


Why does unix show paths as strings? It's a lie.

65 ≠ 'A'.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: