

Sizeof(char) is 1 - signa11
http://drj11.wordpress.com/2007/04/08/sizeofchar-is-1/

======
xpaulbettsx
I actually disagree - while sizeof(char) might always be 1, code isn't just to
communicate to a compiler, it's to communicate to _another developer_. When I
see "sizeof(char) * strlen(foo)", I know that they want to allocate 'n'
characters and that this is a string buffer: even though as a dev I certainly
could've figured out, making code read like you think makes it far faster for
others to interpret the intent of the code.

It's like when people define a 64MB buffer as something like:

#define BUFSIZE (64 * 1024 * 1024);

Even though you could've written 67108864, the former is far more
comprehensible.

~~~
CoreDumpling
It bears repeating:

"[P]rograms must be written for people to read, and only incidentally for
machines to execute." -- Abelson & Sussman, _Structure and Interpretation of
Computer Programs_ [1], strangely misquoted by Paul Graham [2]

[1] <http://mitpress.mit.edu/sicp/full-text/sicp/book/node3.html>

[2] <http://www.paulgraham.com/hp.html>

~~~
yycom
The best quote IMO is Knuth: programming is the art of telling a human what
you want the computer to do.

------
rdtsc
While the author is ridiculing those "silly" Mozilla devs I would like to
ridicule the author of the blog. He seems like the person who would try to
write "clever" code. Code that not only works correctly but teaches everyone
_all_ the intricate operator precedence rules and all the possibly usages of
bit operators, perhaps also code where they "cleverly" manipulate the call
stack with asm inline instructions to demonstrate their assembly and C
knowledge.

One thing I know -- that person is dangerous. How do I know? I was (perhaps
still am) that person. It was a bad habit and I am still trying to get rid it.
It is a veiled show of immaturity and arrogance.

~~~
wlievens
Absolutely.

I once had to deal with code that did something like this:

    
    
      return a ? b : c ? d : e ? f : g;
    

I'm not kidding.

~~~
rix0r
Which is actually pretty useful and readable, imho (if formatted correctly):

    
    
      return a ? b :
             c ? d :
             e ? f :
             g;
    

Side note: this does not work in PHP, as the operator is left-associative
instead of right-associative there. The above in PHP would be evaluated as:

    
    
      return ((a ? b : c) ? d : e) ? f : g;
    

Which is almost never what you want. I can't even format this properly to
convey intent.

~~~
wlievens
The only sane thing is to use if statements, imho. No ninja cowboy engineer
nonsense.

~~~
sukuriant
Sometimes, if-statements unnecessarily add lines, and the lines become harder
to read.

I would say it's a case-by-case statement. Well, okay, more specifically, I
would say that the use of a _single_ ternary if is a case-by-case statement.

~~~
wlievens
Absolutely. I use ternary if when appropriate. But nesting ternary ifs is
horrible, and nesting ternary ifs while discarding parentheses is malevolent.

------
DenisM
Rarely have so many bytes^H octets have been wasted in the course of
indulgence in such a pointlessly misguided indignation. Really, of all the
software engineering problems that industrial reality presents in front of us,
this one deserves the least attention. I am only thankful this is not some
celebrity news of which there has been too many here in recent weeks, but that
is not a very high standard to measure against.

------
patio11
OK, I'm not exactly a professional C programmer, but among those who are: is
the use of sizeof(char) here more important than the copy operation that is
going to write the terminating null past the area actually malloc'ed?

Because that sounds suspiciously like "potentially exploitable" to me.

~~~
ssp
No, you are right. The copying is a bad bug. Using sizeof(char) is mostly a
coding style question.

The casting of the malloc() return value is something reasonable people can
disagree about (by which I mean nerds can endlessly flame each other about).
But here are some points:

* In C++, which Mozilla is written in, it is a compile time error to _not_ cast, so most likely this blog post is just wrong. Conceivably, the file could have been a C file in an otherwise C++ project, but even in that case, it's at least understandable why the cast is there.

* If you have a macro that takes the type as a parameter:
    
    
        #define alloc(type)      ((type *)malloc(sizeof(type)))
    

then the casting is a _good_ thing because it makes the compiler warn if you
try to assign the result to the wrong pointer type.

* It is true that if you forget to include the stdlib.h header, the cast will silence a useful warning about converting int to pointer.

~~~
ww520
In the blog's comment, someone pointed out the same thing but the author then
dismissed it as, "I’m a C programmer, not a C++ programmer, so I wouldn’t know
anything about that." But considering Mozilla is a C++ project, it's unfair to
critique it out of context.

~~~
sid0
Well, the file he's quoting is C (note the lang:c filter). OTOH he's quoting
Mozilla 1.7.7, which is positively ancient (Firefox 1.0.x I think).

------
psyklic
Ridiculous! This is like saying that we shouldn't use parentheses because all
C++ programmers should know order of operations! Not only that, but it's
probably optimized out by the compiler (and if not, it is a quite
insignificant speedup). Better to be safe using sizeof(x) than sorry ...

~~~
chc
Absolutely optimized out. `sizeof()` is evaluated at compile time.

------
dlsspy
Many things equal one. I like to specify which one I'm talking when I'm
writing code.

~~~
edanm
What a great sentence. I'm using this the next time I teach someone to program
and have to explain the value of constants! :)

------
zbanks
Is this really important? Sure, the source may be slightly polluted, but it's
not going to slow down the application once it's compiled.

Really, these sort of things are probably _good_ in code: they make it
explicit that you are dealing with _char_ 's, and not some other datatype. If
left out, it may look like a bug, causing debugging hassles.

~~~
chc
Agreed. I'm a strong proponent of lean codebases, but I still write
`sizeof(char)`, because otherwise it's a dead ringer for a common mistake.
Code should be concise only to the point that it doesn't become _harder_ to
read.

------
kingkilr
Maybe this is just the shitty programmer in me, but I'm going to go right
ahead using sizeof(char) (I knew it's hardcoded at 1 before this, and I'll
know after this), it appeals to the foolishly consistent programmer in me.

------
pietrofmaggi
Personally I don't agree with the post, starting from his analysis, of the
first snippet of code:

    
    
      group->text = (char *) malloc(sizeof(char) * strlen(params->text));
      if (group->text == NULL) {
    	res = MP_MEM;
      }
      strcpy(group->text, params->text);
    

I don't think that the sizeof(char) or the casting of the pointer are of any
danger and increase readability (we are not coding to show how well we know
the standard).

The poster is right about the missing space in the malloc call for the NULL
char, but I see a bigger threat calling a malloc with the size coming from a
strlen: it can be really dangerous if somehow you miss a NULL terminator.

And again, same complain about the usage of _strcpy_. The standard gives us
_strncpy_ that put a limiting size in the copy operation.

_Personally I think that strcpy has to be avoided as a plague being a source
of buffer overruns error as no other call in the C library._

~~~
lukatmyshu
gets ... at least you theoretically can use strcpy securely.

------
bruceboughton
Everyone is commenting on his point about sizeof(char). Does anyone have any
comments on his recommendation to use expressions with sizeof rather than
types?

~~~
zbyszek
Personally, I like the idiom

    
    
      T* t = malloc( sizeof *t );
    

It's no big deal, but it's just one less thing to change if you need to change
the type. (I once had to go through some code changing longs to ints to make
it run on a 64-bit machine. Were the mallocs written this way it might have
been a bit easier.)

And although I agree that his sizeof(char) rage is overblown, is the author's
point about the zlib code not reasonable?

------
hetman
It should be noted that not all C compilers are fully standard compliant. Many
embedded system vendor compilers for example use incorrect operator
precedence. Available library can also do things in non-standard ways.

This does not look like advice about coding for the real world, it seems to be
about how to invest your time to maximise feelings of smug superiority.

------
lelele
This article only shows that some C programmers are not able to think at an
higher level than whatever their compiler sees. They don't understand that a
programming language standard is not a style guide.

------
c00p3r
If a char is to be defined as number from 0 to 255 that and its storage space
is exactly 1 byte - that is correct.

But it is rather a legacy nowadays. No one should use it.

~~~
RiderOfGiraffes
I think you miss the point.

A byte is always 8 bits, and therefore contains unsigned numbers 0 to 255
inclusive.

A char is always defined to be the base size of the machine.

By definition, therefore, sizeof(char) is always 1.

However, a char is not always 1 byte. On some machines a char can hold the
values 0..511, and on others 0..65535. (for example)

~~~
masklinn
> A byte is always 8 bits, and therefore contains unsigned numbers 0 to 255
> inclusive.

That definitely isn't true. Historically, bytes went from 5 to 16 bits, it was
just the number of bits required to handle a character.

If you want to talk specifically about 8-bit bytes in an architecture-
independent manner, use the word "octet" (which happens to be the word french
people generally use).

`char` is a technically independent (though often related) datatype defined as
being >= 8 bits by the ANSI standard and the definition of `sizeof(char) == 1`
is an axiom of the ANSI standard, not the consequence of anything but itself,
though the standard definitely seems to confound (confuse?) chars and bytes:

2 The sizeof operator yields the size (in bytes) of its operand, which may be
an expression or the parenthesized name of a type. The size is determined from
the type of the operand. The result is an integer. If the type of the operand
is a variable length array type, the operand is evaluated; otherwise, the
operand is not evaluated and the result is an integer constant. 3 When applied
to an operand that has type char, unsigned char, or signed char, (or a
qualified version thereof) the result is 1. When applied to an operand that
has array type, the result is the total number of bytes in the array.88) When
applied to an operand that has structure or union type, the result is the
total number of bytes in such an object, including internal and trailing
padding.

------
Ayjay
Really, a better coding style is to use STL strings, containers and algorithms
and never get down to the point where you have to care.

Coding at the bare-metal level is a very slow, tedious, error-prone and silly
way to code unless you're doing, or have some external constraint that forces
you to (and, no, efficiency isn't usually a reason not to use C++ or the STL,
it almost always compiles to the same as using strcpy's and malloc's.)

\-- Ayjay on Fedang/coding

~~~
billforsternz
The man is writing about C, not C++. C remains important. I agree that C++ and
the STL are good toolsI don't agree that STL style string handling code will
compile down to the same thing as C style string handling code.

