
C is Lower Level Than You Think - prajjwal
http://prog21.dadgum.com/185.html
======
Nursie
No, C is lower level than _you_ think.

That's perfectly rational behaviour. And yes, the string could change mid
iteration, in the loop or in another function that has access to the address,
or even in another thread that's just made a guess at a valid memory address.

This is the beauty and power of C, everything is a piece of memory and nothing
is guaranteed, and no we will not hold your hand or stop you doing something
stupid. Someone else has a valid use case for that, even if they don't know it
yet.

~~~
sgift
> This is the beauty and power of C, everything is a piece of memory and
> nothing is guaranteed, and no we will not hold your hand or stop you doing
> something stupid. Someone else has a valid use case for that, even if they
> don't know it yet.

Just replace C with assembler and we know why we should all go back to it. Or
rather not ..? Guarantees provide reliable environments. Reliablity is really
helpful.

My nitpick aside ... I still want to (finally, really) learn C some time.
Yeah, I can read it more or less, but that's not the same. Maybe on my next
vacation.

~~~
goldenkey
I really don't think you have a space to speak here if you don't know C. C is
a behemoth, sure but there's no surprises. And assembler is platform specific,
so you can't really make the 'bandwagon'/'slippery slope' logical fallacy.
Stay in Python/Ruby/TLC world, and leave us hard-core coders alone. We'll
continue making the performant libs that your dynamic language runtime
requires. Thanks.

~~~
sgift
Much real world C code is platform specific too. As written: I can read C
(more or less) and I can recognize things like code that will only work on
your nice little endian platform and will explode in your face on big endian.
Code like that is written all the time in C - especially by programmers that
"continue making the performant libs" or whatever you think you do. So, the
abstraction over assembler is often far smaller than you make it out to be.
But that is a completely different point, which has nothing to do with my last
post.

~~~
goldenkey
sgift, dealing with big vs small endian is a ways away from dealing with
different opcodes and instruction sets like SSE and whatnot. It's not far
smaller. It's huge. You aren't speaking from experience. As someone who used
to do reverse engineering and knows assembly first-hand, I'm not going to
argue with you about the different between writing code in assembly versus C.
You're incorrect to say that ASM and C are so close. There's a reason why
certain routines are implemented in CPU architecture specific asm, and that's
for speed. Because C is still too high up to make many optimizations. The lack
of fine-grained-ness is definitely there, in regard to the processor
instructions used.

------
10098
It's weird to me that a post like this even needs to be written. Isn't it kind
of obvious that it's highly problematic for the compiler to do this
optimization because the function, generally speaking, may have side effects?

~~~
username42
For me, this kind of post means that C language is not any more core basic
knowledge. The new generation has selected a new set of core basic knowledge:
javascript, html5, python (for teaching of algorithm, not for production),
scalability, replication, UX/UI ...

------
fhd2
Here's a gotcha I find more interesting than strlen: Right shifting integers
instead of dividing by powers of two, e.g:

1337 / 8;

Becomes:

1337 >> 3;

Compilers ought to figure that one out, shouldn't they? Well if 1337 is a
signed int, they can't. Shifting actually leads to different semantics for
negative numbers, so this optimisation is only possible for unsigned ints. You
can either do the bit shift or cast to unsigned, but it's a leaky abstraction
either way.

~~~
azernik
I'm pretty sure that the different semantics for negative signed ints (sign
extension) are there to make sure it works as division. It's right-shifting
for regular bit manipulation purposes that's unsafe with signed.

~~~
icodestuff
Whether right-shift is arithmetic or logical is implementation-defined in C.
The compiler (for obvious reasons) knows what hardware is available, and
generally optimizes a divide by a power of two to a right-shift if arithmetic
is available (as it is in most architectures), plus a check to see if the
number is odd and negative (and to add 1 to the numerator if so). All of which
is still faster than using the divide hardware.

~~~
azernik
One more instance where I just assumed that what GCC does was in the standard.
Very interesting.

For the curious who don't want to duplicate my look-up - right shift is by
definition division by two _except_ negative signed numbers, for which it's
implementation defined. GCC on every architecture I've checked uses arithmetic
shift for signed, probably because that extends the base definition to
negatives.

------
jayferd
This is a problem in most imperative-paradigm languages, unfortunately. The
issue is in the semantics of the `for` loop, which make everything confusing
both for the programmer and the compiler.

~~~
10098
I think the issue is that you don't know what's going to happen when you call
an arbitrary function.

------
goldenkey
There's no reason that the compiler should even optimize strlen out of the
loop. Here's the proper way to iterate a string if order doesn't matter, take
notice:

    
    
       for (int i = strlen(str); i--;)
          ...loopContents...
    

There's a right way to handle values that are repeatedly used, and a wrong
way, ie. relying on the compiler to optimize. Many dynamic languages store the
length of a string buffer, as part of a String object. ie str.length, and it's
very cheap to query, just an accessor/property/getter.

What is misguided in this rant, is the attack against strlen which operates on
pointers to zero-terminated strings. The length isn't stored, the only storage
is for the string's bytes. If you want to have fast access to strlen, and you
find yourself being pained by the extra variable to cache the value, then use
std::string.

This blog post needs a refactor.

~~~
Rangi42
"The proper way"? Not only is it less familiar than the usual "i = 0; i < n;
i++" idiom, requiring the reader to examine exactly what it's doing, but now i
goes from n to 1 instead of 0 to n-1. What's so bad about caching the strlen
result in a variable beforehand?

~~~
goldenkey
You're wrong. Maybe I should hire my next team based on loop analysis ;-)

[http://codepad.org/kd3oOBAb](http://codepad.org/kd3oOBAb)

~~~
vidarh
And I think this perfectly demonstrates why it is a stupid approach to take in
the first place. In fact, personally I have a tendency to avoid for-loops in C
because of the _awful_ syntax; I much prefer using while-loops as I find them
far clearer.

In 25+ years of C, I've seen plenty of people getting for-loops wrong (making
broken assumptions about what's evaluated where, and countless off by one
bugs), but everyone gets while-loops.

If you are going to do for-loops in C, stick with the most obvious, simplest
variants possible - the moment you try to be smart, you've substantially
increased the odds that a maintenance programmer will introduce a bug at one
point or another.

~~~
nitrogen
For the sake of job security, I propose using the comma operator to put the
entire loop body within the for() parentheses, and replace the block with a
semicolon:

    
    
      // Pure C99 (prints an invalid character at the end)
      for(
          FILE *f = fopen("/etc/fstab", "r");
          !feof(f) || (fclose(f), 0);
          putchar(fgetc(f))
          ) ;
    
      // C99+POSIX
      for(
          int fd = open("/etc/fstab", O_RDONLY), c = 0;
          read(fd, &c, 1) || (close(fd), 0);
          write(1, &c, 1)
          ) ;

------
vsbuffalo
This seems like surprising behavior because we expect compilers will make wise
choices. Beginning programmers do the same: for example I've seen something
like:

    
    
        for x in stuff_list:
            print other_list_of_stuff.count(x)
    

This is intuitive to beginning programmers and uses Python's methods. But
there's hidden complexity behind it. C is no worse a language for not moving
strlen out of the loop than Python is for not optimizing this. Languages work
at their level, and it's essential the programmer understand the precise
bounds of this level.

~~~
smsm42
Nobody who tried to write some code compiling something (I'm not even talking
about optimizing C compiler, but parsing or processing any sufficiently rich
format that humans can touch) wouldn't expect compilers to make too much of
the wise choices. Making code that behaves correctly, in a predictable manner
according to what people expect and be "wise" too is very hard. It's easy to
say "oh, it's clear I meant foo when I wrote bar" but try to make a code that
identifies cases where "bar" means "foo" with 100% accuracy and you'll get why
compilers aren't as wise as some think they could be.

------
augustk
The problem here is that the for loop in C is a direct generalization of the
while loop. In a Pascal-like language like Oberon the bound is only calculated
once, so

    
    
        FOR i := 0 TO Strings.Length(s) - 1 DO
           ...
        END
    

would be equivalent to

    
    
        lim := Strings.Length(s) - 1;
        WHILE i <= lim DO
           ...
           INC(i)
        END

------
DmitryNovikov
P.S. We have PVS-Studio rule for this issue:
[http://www.viva64.com/en/d/0309/](http://www.viva64.com/en/d/0309/)

------
aneeskA
[http://prog21.dadgum.com/179.html](http://prog21.dadgum.com/179.html)

This is my favourite article of him :)

------
joshguthrie
I don't see how the compiler's smartness factors in a language's level.

------
Theriac25
tl;dr if you don't know what you're doing, you don't know what you're doing.

