

The oldest trick in the ASCII book - jgrahamc
https://blog.cloudflare.com/the-oldest-trick-in-the-ascii-book/

======
ptaipale
It's an old trick to convert between upper and lower case like this, but I
think it's becoming actually harmful to think you can or should do things like
this.

Modern applications are run all over the world, and use text that is often
non-ASCII. In any relevant programming environments, there are libraries of
functions to do efficiently this kind of tasks, in a way that takes into
account that your text is not just English ASCII, but may be Cyrillic,
Chinese, Arabic, or even European languages which contain letters that are of
Latin origin but not in the ASCII set. Or Telugu or Tamili or Thai. Or
Klingon, for that matter.

So what purpose would it serve to start coding your own uppercase/lowercase
routines, when you are most likely to get it wrong?

Okay, Tengwar (Elvish) is not yet properly supported by Unicode.

~~~
gumby
> It's an old trick to convert between upper and lower case like this, but I
> think it's becoming actually harmful to think you can or should do things
> like this. > Modern applications are run all over the world, and use text
> that is often non-ASCII.

So: I disagree. And I say that as someone who daily uses two languages that
can't be represented in USASCII.

The specific case described by the article was parsing internet protocol data
for, say, SMTP or HTTP etc. It's best to consider it a special kind of binary
protocol that some people can read directly. One artifact of these protocols
is that they are case-insensitive.

You can't use this trick on user data (which includes domain names these days)
and the author doesn't say you should. But for example if you're parsing the
response to an EHLO it should be perfectly fine to parse it this way.

Yes, it happens to be in English, but so what? Music is in Italian, a language
I don't speak, but I have no trouble understanding "da capo a la fine", though
I might be confused if it saw it anywhere other than a score.

Plus I have to say sometime I feel there's simply _too much_ abstraction and
it's nice to look at the bits.

~~~
ptaipale
Still, what's wrong with toupper(), besides not getting to show off that you
know the archaic characteristics of ASCII? (Not that I'm free from that sin of
bragging myself, of course.)

Library functions provide idiomatic, readable ways to do things.

~~~
gumby
Of course you should use toupper(). toupper is great and handles all sorts of
corner cases. The person who would use a hack like this and case already used
toupper here.

But toupper is slower AND that slowness is unnecessary in some cases. These
are the kind of optimizations you use after you have revisited your
algorithms, after you have profiled your code, and when you know you have a
code path that is heavily used, where saving a few cycles really _does_ have
big gains. If I were writing the parser front end for GMAIL's SMTP server, for
example, a micro optimization like this could pay off.

Or (more likely in my case) when I have an embedded device with only 4K of
program memory that nevertheless needs to talk to a syslog or time server
_and_ still perform its normal function.

This is not a "party trick" optimization a la HAKMEM.

And that's not an "archaic characteristic of ASCII" it shows thoughtful design
from an age of specific resource constraints. We don't have those constraints
now, but we have others and this is a good example of how they can manifest
themselves, and be designed for.

And for that matter it shows why, for all that I hate it, ASCII itself is not
archaic.

