
The type of char + char - alexchamberlain
https://blog.knatten.org/2019/05/24/no-one-knows-the-type-of-char-char/
======
_kst_
I wouldn't describe this as "no-one knows" the type of char + char.

I know what the type of char + char is. I know that it's either int or
unsigned int, depending on the ranges of values supported by types char and
int. I know what it is for any given implementation. And I know that it's int,
not unsigned int, for every implementation I've ever used or am likely to use.

Implementation-defined features are not some unsolvable mystery. They're just
implementation-defined.

~~~
BearOso
And we can count 99.999% of used implementations on one hand. If you’re on a
strange platform, there’s a reason for that, and chances are you’re uniquely
aware of any differences or will be writing assembly.

------
flqn
This confirms one of the guidelines I've always been taught; char is not an
arithmetic type, and never treat it as such. It represents ascii characters,
and nothing else.

~~~
Piezoid
(u)int8_t have the same problems, including aliasing because they are just
alias of (unsigned) char. Sometimes it's nice to have modular arithmetic mod
256, or compact memory layout for eg. count sketches.

~~~
wahern
If int8_t exists[1], then you know that char is 8 bits[2] and therefore know
that char in char + char always promotes to int because int must have at least
16 value bits and a 16-bit int can represent any char value regardless of
signedness.

[1] int8_t is not required.

[2] char is the fundamental unit of addressability. sizeof char _always_
evaluates to 1, sizeof int8_t must be non-0, char must be at least 8 bits, and
int8_t must be precisely 8 bits, therefore sizeof int8_t == sizeof char and
CHAR_BIT == 8.

------
joker3
Why do we need to be able to add two characters again?

~~~
XMPPwocky

      char toupper(char c) {
        if (c >= 'a' && c <= 'z') {
          return (c - 'a') + 'A';
        else { return c; }
      }

~~~
kccqzy
You showed an example of subtracting a character from another. The GP asked
for an example of adding two characters.

~~~
tlb

      (c - 'a') + 'A'
    

contains both

~~~
XMPPwocky
I thought about saying

    
    
       (c + 'A') - 'a'
    

to make this more clear, but I think that's actually UB with signed chars-
e.g. for c='a', 'a'+'A' exceeds the range of a signed 8-bit value!

Promotion should save us here, but that's a bit too yikes-y for my comfort.

------
jimbo1qaz
I found some insightful comments below the post:

>I think that char – char should definitely be legal. The distance between
characters is well defined. Same for char + numeric. Both logically makes
sense. I think a good analogy might be floors in a building. Asking what’s the
distance between the second and seventh floor makes sense, or what’s two
floors above the 4th. But the question ‘what’s the 5th floor plus the 6th
floor’ doesn’t make sense.

> __Affine space __describes these kind of relationships in mathematics. Eg
> position and disposition in n dimension, or count and offset in buffers,
> even timestamp and duration.

------
nayuki
I agree with the article. Here are discussions of related problems with C/C++
arithmetic promotions and overflow:

[https://stackoverflow.com/questions/27001604/32-bit-
unsigned...](https://stackoverflow.com/questions/27001604/32-bit-unsigned-
multiply-on-64-bit-causing-undefined-behavior)

[https://stackoverflow.com/questions/39964651/is-masking-
befo...](https://stackoverflow.com/questions/39964651/is-masking-before-
unsigned-left-shift-in-c-c-too-paranoid)

------
klyrs
I've been writing C for 25 years... and while I technically know "the answer,"
it's effectively a closed door in my mind because I don't always know where my
code will end up.

A sadistic part of me would prefer if it was interpreted as a bitwise and...
not because that's good or reasonable or smart... but to punish the behavior.
But then that backfires when people use it for underhanded code.

------
kstenerud
Yes, yes. The spec is filled with anachronisms that are no longer pertinent in
today's machines. char + char gets promoted to int every time in today's
compilers. Try it out here:
[https://godbolt.org/z/V5HEvV](https://godbolt.org/z/V5HEvV)

~~~
loeg
50-50 anachronisms vs flexibility to allow C to run on novel machines that we
don't currently envision. Sure, it would be nice for developers on today's
machines to reduce it to the conventional subset.

~~~
kstenerud
If by novel you also imply compatible, then sure. The moment you create a
machine that's incompatible with the conventions adopted by the most popular
compilers and architectures, you break a ton of software built upon those
conventions, and sink your hardware in the market because it's a portability
nightmare.

Specs don't matter beyond the conventions they inspire.

------
loeg
A machine where char is as large as int is unlikely in practice as it isn't
very useful. C11 (at least) defines INT_MIN/MAX as covering at least the range
of an int16_t type.

That said, the int promotion alone may be surprising / nonobvious to some
people (it was to me, when I learned about it!).

------
fulafel
Also, if char is signed, char + char may be UB and with known overflowing
values the compiler may deduce it's a can't-happen situation, generating code
accordingly. Or when encountered at runtime, it may hose your program state
arbitrarily, etc.

------
kccqzy
There are hardly any systems relevant today for which adding two char would
result in an unsigned int. So basically just treat it as int and call it a
day.

~~~
viraptor
Or if you're using one, you're likely very aware of that fact and don't
suddenly discover it from an blog post.

------
quocble
Does it matter if it is signed or unsigned int or char? Bitwise it contains
the same amount of information. That's the most important thing.

------
MaulingMonkey
decltype('a'+'a')

~~~
corysama
int

[https://cppinsights.io/s/5dc91167](https://cppinsights.io/s/5dc91167)

