
Walter Bright on C's Biggest Mistake - kssreeram
http://dobbscodetalk.com/index.php?option=com_myblog&show=Cs-Biggest-Mistake.html&Itemid=29
======
asciilifeform
Lack of array bounds checking is not a problem with C.

It is a problem _with our hardware._

Introducing bounds checking without introducing a penalty on array access time
is impossible on our "C machines".

C/C++ are often thought of as "close to the metal" - but they are close to
_particular varieties_ of metal - those designed to run C/C++. We arrived at
them through historical accident. There are many other ways to build a
computer - and it is not entirely obvious that a "C architecture" is
necessarily the simplest or most efficient:

<http://www.loper-os.org/?p=46>

That a language which is "close to the metal" is braindead is _solely a
consequence of braindead metal._

The "C architecture" is a universal standard, to the extent that it has become
the definition of a computer to nearly everyone. This is why you will never
find the phrase "C architecture" in a computer architecture textbook. And yet
it is a set of specific design choices and obsolete compromises, to which
there _are_ alternatives.

------
shadytrees
See also the C FAQ, which patiently devotes 24 questions to the topic. (You
can almost tell just _how_ frequently the question came up on the list.)

<http://c-faq.com/aryptr/index.html>

~~~
weaksauce
Is there another place that buffer overflows occur than in the char* with no
bounds checking? If not, this single fact is the one that leads to so many of
the software vulnerabilities in the wild.

~~~
tptacek
Yes, memcpy overflows are just as common as strcpy overflows, and structure
overflows are more common today than strings, if only because most of the
trivial string stuff has been flushed out by now.

------
xcombinator
Thank god for this mistake, this mistake makes c what it is good at: at low
level programming. It just pass directions between functions. Light and
fast,no abstractions.

I love it, a way of making assembler like coding but multiplatform.

If I want high level programming I will program in another language but when
you want machine control you have c without all the bloat.

~~~
10ren
_C combines the power and performance of assembly language with the
flexibility and ease-of-use of assembly language._

However, I was amazed to find that modern assembly language (since I was last
in the game 25 years ago) has many high-level concepts in it (structures,
loops, conditions etc), and looks... _suspiciously_... C-like.

But you're quite right about portability. Although C is famously not perfectly
portable ( _int_ sizes, all those #defines - just some of the issues Java
tackled), it is a _hell_ of a lot more portable than an actual assembling
language. :-)

~~~
ori_b
> (structures, loops, conditions etc), and looks... suspiciously... C-like.

Which assembly? All the ones I know of - _especially_ the modern ones - have
become even lower level over time, as compilers started liking more regularity
over more powerful instructions.

~~~
nitrogen
The assembly language used by Sharc DSPs has an algebraic syntax rather than
using all mnemonics, and loops are pretty easy to create as well (at least
compared to x86 or, as my current project uses, PIC).

~~~
10ren
Can you give an example of a loop in an algebraic syntax please?

(All I can think of is CFG style, like "A -> (aA)?" oh, of course, there's
also "a*" - still, it would be interesting to see an eg.)

------
CrLf
People are permanently trying to "fix" C, but C has nothing to fix.

It is a limited language, both by the constraints at the time of its creation,
but also by the problem space where it has been used over the years. And
that's how it should be.

C is part of an ecosystem of languages, it doesn't have to be changed to
acommodate the latest fads or to fix problems that nevertheless never stopped
it from being widely used for decades.

If C doesn't fit a purpose, don't use it. You don't even have to stray too
far, since there are a few languages that basically are just C with extras.

~~~
gchpaco
We have been awash in buffer overflows and other, similar errors (printf
strings come to mind) that are actually impossible in a safer language for
years. SQL injection can happen in a safer language but you can't take over
the web server by doing them. There is nothing fundamental about system
languages that requires unsafe array operations. This is a _flaw_ , and it is
a flaw of C specifically and a flaw inherited by many C-descended languages.
This is not some ivory tower thing that was discovered after C was designed;
it was apparent even at the time (although Pascal's fix was pretty bad,
variable length arrays fix it neatly). There are compiler articles from the
late 70s and early 80s pointing out how even a naïve compiler could easily
optimize out bounds checking in most operations!

~~~
Retric
If you design the software correctly then array bounds checking is often a
waste of resources. For a stupid example let's assume you have 3 arrays of the
same size and you are doing this.

    
    
      For (i = 1; i < 10000; i++)
      {
        a[i] = i * i;
        b[i] = a[i] * i;
        c[i] = b[i] * i; 
      }
    

Now that's not a lot of code but with array bounds checking you add 50,000
bounds checks that do nothing useful if the arrays are of the correct size.
Clearly there are uses where those bounds checks are useful, but when you care
about speed they can become fairly costly.

You might even want to rewrite it as because it really is faster:

    
    
      For (i = 1; i < 10000; i++)
      {
        c[i] = i * (b[i] = i * (a[i] = i)); 
      }
    

PS: Ugly c code often has a vary good reason for looking the way it does.

~~~
gchpaco
Those are _precisely_ examples where even the most naive late 70s compiler can
optimize out the bounds checking, as is well documented in the literature.
(presuming the sizes are >= 10000, of course) All it takes is a validity
range, noting that i goes from 1 to 9999 and that at each access i is within
the range of the array. To trick out the compiler optimizations you need to at
least start doing nontrivial mathematics on the bounds indices, which are also
the ones that are the least obvious and thus in need of the bounds indices.

~~~
Retric
I would be impressed if that worked correctly on multithreaded code or if it
could survive some of the more esoteric pointer manipulations you can do in C.
It's mathematically impossible to make the perfect language for all problems,
so while many languages have been built that are a "safe" version of C they
lose something for everything they gain.

PS: Even compiler bugs can be useful under the correct circumstances.

------
kssreeram
I feel the lack of a module system is the biggest mistake in C. It is tiresome
to prefix every single public function: list_append, list_delete,
hashmap_insert etc.

------
InclinedPlane
I'd say using null-terminated strings rather than pascal style length embedded
strings is C's biggest mistake. Responsible for so many inefficiencies (strlen
is O(n) instead of O(1) as it should be) and, worse yet, so many incredibly
serious security vulnerabilities.

All to avoid having to incur a 1-3 byte per string overhead or figuring out
how to efficiently work around a 255 character limit.

~~~
nitrogen
Other than strlen, string operations can be faster with null-terminated
strings. Plus, there are other benefits:

If you want to turn some arbitrary data into a string, just put a 0 byte where
you want it to end.

Instead of having to increment and compare both a counter and a pointer (or
store a final pointer for comparison) in string-manipulation operations, you
just increment the pointer. It makes for very concise loops in strchr, etc.

Tokenizing a string is just a matter of throwing down 0 bytes where the tokens
are (as strtok does).

Passing a tailing subset of a string to a function is as easy as adding an
integer to the string pointer, rather than requiring a memcpy and length
calculation.

~~~
philwelch
Null-terminating arbitrary data to turn it into a string is great--if your
arbitrary data doesn't have any 0 bytes in it naturally.

Strtok is just as easily handled by handling strings as a 2-item struct: a
size_t for length and a pointer (separating the size data from the character
array). In fact, that would kick ass: you can non-destructively tokenize, pass
around substring arguments, etc.

~~~
nitrogen
Storing size separately from the character data, rather than as a prefix to
the character data, would indeed alleviate some of the potential problems of a
size+data string. I could see myself using such an implementation in
situations where the minimal overhead of allocating a separate area on the
stack or heap for the size+pointer struct is negligible. Expanding strings
would certainly be a lot easier, and it would open the posibility (as you
mentioned) for complex data sharing among strings, similarly to Qt4.

------
Luyt
When I was reading this, I thought "Nooo! Don't make the size of an array part
of its type!" That has been rightfully shown as a very bad idea by Brian
Kernighan, see <http://www.lysator.liu.se/c/bwk-on-pascal.html> Luckily the
proposal is about passing a 'fat pointer', really a pointer and a length. I
did that often in my C programs too: int process(char *buf, int buflen);

Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just
syntactic sugar.

~~~
nimrody
I don't think Walter suggested having the array size part of the type (static
array types).

He suggested using "fat pointers" -- pointers along with their extent. This is
similar to how many Pascal compilers treat the type "String".

Kernighan mentions Pascal strings in his article but claims the solution does
not scale to other types. Walter's solution does work for all array types (but
admittedly has other problems).

------
coliveira
I think the preprocessor is the biggest mistake. It was introduced to address
issues of separate compilation in a simple way, but it generated more trouble
than advantages.

~~~
__david__
I understand the issues with the pre-processor, but I still think C is better
off with it than without it. I know it can be abused horrifically but it can
also be abused in really nice and convenient ways. If you construct you macros
right they can be useful or even wonderful. In that sense the pre-processor is
very C-like.

The things that makes macros actually nice are some of the gcc extensions,
like the ({ }) block expression syntax and typeof().

------
rbranson
I don't think arrays are "converted" to pointers. Arrays are simply a cleaner
way of doing pointer arithmetic and allocating large(r) blocks of the stack.
Nothing is lost in this "conversion." The array never knows it's own
dimensions beyond the time you declare it. It's up to the developer to keep
track of that.

~~~
agazso
Here is an example that caused me a few bugs.

    
    
      // define a new type called md5_t
      typedef char md5_t[33];
      md5_t g_md5;
      // here sizeof(g_md5) == sizeof(md5_t)
      
      void f(md5_t md5)
      {
        // here sizeof(md5) == sizeof(char*)
      }
    

The type information definitely lost inside functions.

~~~
rbranson
This works, sure, but if you do:

    
    
            char test[1024][32];
            printf("%i\n", sizeof(test));
    

You'll get 1024 * 32 (32,768). It didn't know that it's a 2D array. When you
pass a stack-allocated array, it passes by pointer. It works this way because
C is portable assembler.

~~~
barrkel
There is no logical connection between the use of the `sizeof` operator and
differentiating between 1D and 2D arrays, so "it didn't know that it's a 2D
array" makes no sense.

Secondly, it's not really meaningful to talk about "pass by pointer".
Parameters are generally either passed by value, by reference, or by name. C
only does pass by value; to emulate pass by reference, you need to pass the
address by value, as a pointer - or as an "array". But the only array type you
can declare is missing the crucial dimension length aspect.

Finally, saying that C works the way it does because it's portable assembler
does not have strong explaining power. Arrays are not usually machine-level
concepts; pointers with appropriate declaration syntaxes for static and
automatic block allocations could take their place in C. Since C does have a
higher-level construct called an array, it's not a big leap to imagine it
being better designed.

------
giardini
C's biggest mistake would have to be C++.

