
Conflating pointers with arrays: C's biggest mistake? (2009) - etrevino
https://www.digitalmars.com/articles/b44.html
======
xroche
This is IMHO by far NOT C' biggest mistake. Not even close. A typical compiler
will even warn you when you do something stupid with arrays in function
definitions (-Wsizeof-array-argument is the default nowadays).

On the other hand, UBE (undefined, or unspecified behavior) are probably the
nastiest stuff that can bite you in C.

I have been programming in C for a very, very long time, and I am still
getting hit by UBE time to time, because, eh, you tend to forget "this case".

Last time, it took me a while to realize the bug in the following code snippet
from a colleague (not the actual code, but the idea is there):

struct ip_weight { in_addr_t ip; uint64_t weight; };

const struct ip_weight ipw1 = {0x7F000001, 1}; const struct ip_weight ipw2 =
{0x7F000001, 1};

const uint32_t hash1 = hash_function(&ipw1, sizeof(ipw1)); const uint32_t
hash2 = hash_function(&ipw2, sizeof(ipw2));

The bug: hash1 and hash2 are not the same. For those who are fluent in C UBE,
this is obvious, and you'll probably smile. But even for veterans, you tend to
miss that after a long day of work.

This, my friends, is part of the real mistakes in C: leaving too many UBE. The
result is coding in a minefield.

[You probably found the bug, right ? If not: the obvious issue is that 'struct
ip_weight' needs padding for the second field. And while all omitted fields
are by the standard initialized to 0 when you declare a structure on the
stack, padding value is undefined; and gcc typically leave padding with stack
dirty content.]

~~~
blub
Interestingly, the idea of hashing over the bytes of a data structure would be
possible in C++, but it is non-idiomatic and instead one would use hash
functions for each type and in the case of a struct the individual members
would be accessed to build the hash value.

The idea of using bytes is error-prone, but now that you mentioned it, pretty
typical of the C mindset.

Of course C++ also has some of these cultural biases. I think they're an
important reason why unsafe code continues to be written.

~~~
bluecalm
I object to reinterpreting bytes of one object as object of another type being
C mindset. Yes, this is done sometimes if you really need it (compression
being one example) but it's not really something you write everyday in C and
doing it to calculate hash is just lazy.

~~~
amluto
Except that hashing a block of memory is likely to be _much_ faster than
hashing individual fields once there are more than a couple fields.

IMO the right solution would be a special annotation on a struct that says “I
want the logical value of this struct to uniquely determine the bytes of the
struct’s in-memory representation.”

Of course, adding such an attribute without nasty edge cases may be tricky.

~~~
Tarean
But this mostly matter for arrays and then an alternative optimization might
be to store it as a structure of arrays. This doesn't need the padding and is
more simd friendly.

------
drfuchs
The proposal here is way too vague. And if you flesh it out, things start to
fall apart: If nul-termination of strings is gone, does that mean that the fat
pointers need to be three words long, so they have a "capacity" as well as a
"current length"? If not, how do you manage to get string variable on the
stack if its length might change? Or in a struct? How does concatenation work
such that you can avoid horrible performance (think Java's String vs.
StringBuffer)? On the other hand, if the fat pointers have a length and
capacity, how do I get a fat pointer to a substring that's in the middle of a
given string?

Similar questions apply to general arrays, as well. Also: Am I able to take
the address of an element of an array? Will that be a fat pointer too? How
about a pointer to a sequence of elements? Can I do arithmetic on these
pointers? If not, am I forced to pass around fat array pointers as well as
index values when I want to call functions to operate on pieces of the array?
How would you write Quicksort? Heapsort? And this doesn't even start to
address questions like "how can I write an arena-allocation scheme when I need
one"?

In short, the reason that this sort of thing hasn't appeared in C is not
because nobody has thought about it, nor because the C folks are too hide-
bound to accept a good idea, but rather because it's not clear that there's a
real, workable, limited, concise, solution that doesn't warp the language far
off into Java/C#-land. It would be great if there were, but this isn't it.

~~~
WalterBright
I happened to know the idea does work, and has been working in D for 18 years
now.

> If nul-termination of strings is gone, does that mean that the fat pointers
> need to be three words long, so they have a "capacity" as well as a "current
> length"?

No. You'll still have the same issues with how memory is allocated and
resized. But, once the memory is allocated, you have a safe and reliable way
to access the memory without buffer overflows.

> If not, how do you manage to get string variable on the stack if its length
> might change? Or in a struct? How does concatenation work such that you can
> avoid horrible performance (think Java's String vs. StringBuffer)?

As I mentioned, it does not address allocating memory. However, it does offer
one performance advantage in not having to call strlen to determine the size
of the data.

> On the other hand, if the fat pointers have a length and capacity, how do I
> get a fat pointer to a substring that's in the middle of a given string?

In D, we call those slices. They look like this:

    
    
        T[] array = ...
        T[] slice = array[lower .. upper];
    

The compiler can insert checks that the slice[] lies within the bounds of
array[].

> Am I able to take the address of an element of an array?

Yes: `T* p = &array[3];`

> Will that be a fat pointer too?

No, it'll be regular pointer. To get a fat pointer, i.e. a slice:

    
    
        slice = array[lower .. upper];
    

> How about a pointer to a sequence of elements?

Not sure what you mean. You can get a pointer or a slice of a dynamic array.

> Can I do arithmetic on these pointers?

Yes, via the slice method outlined above.

> If not, am I forced to pass around fat array pointers as well as index
> values when I want to call functions to operate on pieces of the array?

No, just the slice.

> How would you write Quicksort? Heapsort?

Show me your pointer version and I'll show you an array version.

> And this doesn't even start to address questions like "how can I write an
> arena-allocation scheme when I need one"?

The arena will likely be an array, right? Then return slices of it.

~~~
drfuchs
How exactly then would one declare a stack variable that is a string that is
initialized to "abc", and then later has "1234" concatenated to it, so it's
now "abc1234", without using the heap in any way? If the answer is "you can't
do that", then that's fine for Java/C#/D, but not for C.

~~~
Too
You can't do that in C either. Without either: 1) "abc" string happens to be
the last variable on the stack, good luck with that. Or 2) You pre-allocate
the string to fit a bit extra, not exactly a "solution".

Maybe option 1 is feasible but not really practical, i can only see it being
used in _extremely_ low level stuff with a non standard compiler and going
through tons of hoops like pre-creating loop-variables used later in the
function and disabling optimizer.

~~~
drfuchs
char foo[9]; strcpy(foo,"abc"); strcat(foo,"1234");

And yes, that's specifically written to be awful and unsafe, but there are
circumstances where you need to be close to the metal and carefully resort to
more complicated variations of such things. That's what C is fairly uniquely
appropriate for.

~~~
WalterBright
Your example uses three strlen's. It also is at high risk for buffer
overflows. Whenever I review code like this, sure as shootin', there's an
error in it in the lengths somewhere. Here's the same code using dynamic
arrays:

    
    
        char foo[9]; foo[] = "abc"; foo[3..3+5] = "1234";
    

No unchecked buffer overflows, and no calls to strlen. The +5 puts the
terminating 0 on.

~~~
drfuchs
You seem to be confused about the implementation of C's standard string
library. There are certainly not 3 strlen calls underneath a strcpy() plus a
strcat() call.

But what's the mention of "terminating 0"? The article says that terminating
zeros should not be needed under its proposal; and that's what I was saying
didn't make sense. [Added:] So, if you didn't just happen to know that the
global string variable foo contains a string that was 3 characters long, how
would you concatenate "1234" to it? I don't see any way without either double-
fat pointers, or terminating NUL.

~~~
WalterBright
strcpy(s1,s2) does a strlen on s2.

strcat(s1,s2) does a strlen on s1 and s2.

Now, two of the strlen's can be replaced with byte-by-byte copies checking for
0 for each, but that tends to lose the efficiency that a memcpy would bring,
so you're pretty much suffering from it anyway.

BTW, here's the strcat I wrote eons ago:

[https://github.com/DigitalMars/dmc/blob/master/src/CORE32/ST...](https://github.com/DigitalMars/dmc/blob/master/src/CORE32/STRCAT.ASM)

It does do two strlen's (the repne scasb instructions). With the improvements
in CPUs since there are probably better ways to write it, but that was pretty
good for its day.

Here's strcpy:

[https://github.com/DigitalMars/dmc/blob/master/src/CORE32/ST...](https://github.com/DigitalMars/dmc/blob/master/src/CORE32/STRFUNC.C#L194)

which does the test-every-byte method. I think Steve Russell wrote it, but I'm
not sure.

If there's anything efficiently implemented in a C compiler, it's memcpy.
Being able to implement string processing in terms of memcpy leverages that
very nicely. strcat() and strcpy() don't leverage it.

Which do you think is faster (s2 is 1024 bytes long)?

    
    
        strcpy(s1, s2);
        memcpy(s1, s2, 1024);
    

I've dramatically speeded up a lot of my code and other peoples' by replacing
the strxxx functions with memcpy. It's low hanging fruit and one of the first
things I look for.

~~~
shakna
I'm usually one to lean on your knowledge, but looking at glibc's strcpy [0],
I don't see a strlen. I do see two macros, CHECK_BOUNDS_LOW and
CHECK_BOUNDS_HIGH, which get defined about here [1].

Am I missing something?

[0]
[https://github.com/lattera/glibc/blob/master/string/strcpy.c](https://github.com/lattera/glibc/blob/master/string/strcpy.c)

[1]
[https://github.com/lattera/glibc/blob/master/sysdeps/generic...](https://github.com/lattera/glibc/blob/master/sysdeps/generic/bp-
checks.h#L28)

~~~
WalterBright
Good question. I preface this by the fact that I am not any expert on gcc
compiler internals, I am assuming how I'd do it.

1\. This implementation tests every byte, as discussed in other posts here.
That makes it slow.

2\. This implementation is likely not used - the gcc compiler probably has an
internal code sequence it emits for a strcpy.

------
Animats
Yes, that's C's biggest mistake. (But remember, they had to cram the compiler
into a 16-bit machine.) No, "fat pointers" are not a backwards-compatible
solution. They've been tried. They were a feature of GCC at one time, used by
almost nobody.

I once had a proposal on this. See [1]. Enough people looked it over to find
errors; this is version 3. The consensus is that it would work technically but
not politically.

The basic idea is that the programmer knows how big the array is; they just
don't have a way to tell the compiler what expression defines the length of
the array. Instead of

    
    
        int read(int fd, char buf[], size_t n);
    

you write

    
    
        int read(int n; int fd, char (&buf)[n], size_t n);
    

It generates the _same calling sequence._ Arrays are still passed as plain
pointers. But the compiler now knows how big "buf" is, both on the caller and
callee side, and can check.

I also proposed adding slice syntax to C, so, when you want to talk about part
of an array, you do it as a slice, not via pointer arithmetic.

The key idea here is that you can call old code from new ("strict") code, and
strict code from old code. When you get to all strict code, subscript errors
should be all checkable.

[1]
[http://www.animats.com/papers/languages/safearraysforc43.pdf](http://www.animats.com/papers/languages/safearraysforc43.pdf)

~~~
mFixman
> I also proposed adding slice syntax to C, so, when you want to talk about
> part of an array, you do it as a slice, not via pointer arithmetic.

I highly disagree with this. One of the advantages of conflating pointers with
arrays is an obvious and very consistent way of indexing and slicing on the
entire language that has minimal syntactic baggage.

~~~
pjmlp
Yes, because typing ptr =&array[0] vs ptr = array is so hard.

------
MrBingley
I absolutely agree. Adding an array type to C that knows its own length would
solve so many headaches, fix so many bugs, and prevent so many security
vulnerabilities it's not even funny. Null terminated strings? Gone! Checked
array indexing? Now possible! More efficient free that gets passed the array
length? Now we could do it! The possibilities are incredible. Sadly, C is so
obstinately stuck in its old ways that adding such a radical change will
likely never happen. But one can dream ...

~~~
m_mueller
I’ll add to this that C having committed to this mistake is one of thr main
reasons some people (scientific programmers) are still using Fortran. Arrays
with dimensions, especially multidimensional ones, allow for a lot of
syntactic sugar that are very useful, such as slicing.

~~~
geoalchimista
Modern Fortran (90 to 2008) has evolved a lot regarding array arithmetic and
broadcasting, yet still maintain backward compatibility. I don't think that
couldn't be done in C, but as many has pointed out, the problem seems to be
why bother when there are already C++/D/Java/C#/Go/Rust ...

However, I'd recommend people who deal heavily with multidimensional arrays
but couldn't sacrifice the low-level C environment for a dynamic language to
consider using the ISO_C_BINDING of Fortran 2003. It provides fully C
compatible native types, and can be compiled together with C (you get gfortran
from GCC anyway).

~~~
macintux
Without knowing Fortran, I’d speculate it’s easier to maintain backwards
compatibility in a language that doesn’t have as direct a mapping to hardware
as C. Fortran seems to have more abstractions built in.

~~~
geoalchimista
That's true. It predated C but even then abstracted the user away from the
hardware (and still does). I wouldn't suggest any use of Fortran beyond number
crunching and array arithmetic.

------
WalterBright
Just for fun, type in this program:

    
    
        int fred(int a[10]) {
            return a[11];
        }
    

It compiles without error with gcc and clang, even with -Wall. The code
generated by clang is:

    
    
        mov EAX,02Ch[RDI]
        ret
    

i.e. buffer overflow, even though the array size is given. Compile the
equivalent DasBetterC program:

    
    
        int fred(ref int[10] a) {
            return a[11];
        }
    
        fred.d(2): Error: array index 11 is out of bounds a[0 .. 10]
    

And the 32 bit code generated (when using 9 instead of 11 so it will compile):

    
    
        mov     EAX,024h[EAX]
        ret

------
bluetomcat
Quite surprised to see this not mentioned. C99 allows you to use the "static"
keyword in array function parameters like this:

    
    
        void foo(int arr[static 10]);
    

It cannot check whether a passed pointer will point to enough space, but the
compiler can warn you if you pass a fixed-size array of a smaller size.

~~~
WalterBright
Dynamic arrays are far, far more common than static ones.

~~~
bluetomcat
I beg to differ. In C especially, static arrays are quite common as struct
members and as static objects at file scope, because dynamic allocations are a
pain to manage and unnecessary when the maximum expected size is reasonably
small.

When the size of such arrays is computed at compile-time via macro
definitions, that feature is quite handy.

~~~
WalterBright
> when the maximum expected size

I pretty much never use static arrays because if I do I always without fail
get a bug report when some user exceeds it.

------
WalterBright
Apparently someone posted this here because of my
remark:[https://www.reddit.com/r/programming/comments/90ov9i/a_respo...](https://www.reddit.com/r/programming/comments/90ov9i/a_response_to_a_comment_on_the_riscv_isadev_list/e2thyts/)

Nice to see it get such a nice response!

------
chmike
From my experience Go's array (slice) is a far better solution. It does not
only carry the size (number of elements), it also carries the array buffer
capacity. To me it's the epitome of what arrays should be.

~~~
zaphirplane
Usually people have to refer to a slice cheat sheet to work with it, perhaps
it’s not an intuitive concept/api

~~~
burntsushi
That's because of a lack of named methods to perform common operations. It has
nothing to do with the fact that slices are fat pointers.

Also, nobody I know constantly looks at a cheat sheet. The concepts motivating
the various slice transformations get ingrained pretty quickly.

------
speedplane
Gimme a break, making stricter requirements on C arrays may theoretically make
some things easier, but we’re talking 1% improvement. What makes C hard (and
great) is requiring an understanding of not just memory, but memory allocation
and deallocation schemes. For many beginners this is hard conceptually, but
for everyone, keeping track of allocated and unallocated memory is extremely
difficult.

~~~
mankash666
Disagree. C was the first language I/we learnt, and it's still my favorite.

It's a bit like the first language you learn. For someone from the Latin
family of languages, Mandarin's verbal & written structure might seem hard,
but for native Chinese, it's second nature.

~~~
speedplane
Mandarin may be your first and favorite, but that doesn't make it easy to
learn. Same with C.

------
ufmace
I haven't written much C, and I don't have a firm opinion on whether or not
that particular issue is C's biggest mistake. I do think that just this one
change sounds radical enough, as far as the effort it would take to convert
existing C code that uses the high-risk pattern, that it seems better to just
wholesale convert to a language that already mandates safety like Rust or
Java. Particularly when you consider all of the other high-risk patterns in C
that these other languages eliminate.

------
User23
This is a very good article that highlights the importance of semantics.

------
hota_mazi
Conflating pointers and arrays seem pretty minor and not the cause for many
bugs.

The main source of bugs in C to me would be pointer arithmetics.

~~~
WalterBright
Pointer arithmetic is mainly used to access arrays, and is where the buffer
overflows come from. Using actual arrays instead allows the compiler to insert
overflow checking code.

------
nearmuse
What's the mistake? You pass a pointer and the number of elements, it's just
the C way. At any point in time you have to pay attention. What is the
proposal here? Make all arrays structures? Or add some weird un-C syntactic
sugar?

~~~
SamReidHughes
It's a question of priorities. It depends whether your goal is to maximize
productivity and minimize the defect rate, or if your goal is to tell people
they need to pay attention.

------
bluecalm
Why is this such a serious issue? I mean it is inconvenient to always pass
length along with the pointer but it's not that inconvenient. It's a bit more
typing but that's where problems end.

------
altrego99
Agree that this is a problem (if the programmer is not careful).

But serious question, why even bother with this one fix?

The only reason for the fix is so to make it more difficult to make errors.

Fix arrays, then you would fix null pointer, then you might add objects,
templating/generics to support a good collections library, rtti, and before
you know it you are creating another one of c++, D, go, java. And we already
have those.

C paved the way. Why not let it be the end of it?

~~~
WalterBright
Because buffer overflows are probably the number 1 security bug in C programs.

~~~
ahmedalsudani
I was wondering why you were championing this idea and agreeing with the
posted link in almost every way. Then I went back to the link and figured it
out :)

P.S. thank you for everything you have done with D. I read in another HN
thread about Better C, and it convinced me that D is the language I should be
investing my time in learning and using.

~~~
bachmeier
> I read in another HN thread about Better C

A good tool to check out, but which hasn't been promoted much because it's
new, is dpp[1]. You can directly reference C header files in your D code. With
that, betterC mode becomes a viable option for adding to an existing C
project.

[1] [https://github.com/atilaneves/dpp](https://github.com/atilaneves/dpp)

------
toolslive
Isn't the fact that core types don't have a fixed representation a bigger
mistake ? a char can be 16 bits, for example, aso.

~~~
flingo
When can a char be 16 bits? I presume it'd still have a sizeof() of 1 though.

~~~
toolslive
Texas Instruments C54x DSPs

It can even be funkier, like 12 bits in a char

[https://stackoverflow.com/questions/2098149/what-
platforms-h...](https://stackoverflow.com/questions/2098149/what-platforms-
have-something-other-than-8-bit-char)

It's a mess

------
nurettin
Fat pointers manifested themselves in Pascal as strings and are still being
used in modern Delphi.

------
apz28
I would love one day that programming should adhere to the discipline as in
bridge/car safety. Simple malpractice will go to jail for it then there will
be no argumment/discussion about this stupid mistake that can be verified by
tool Cheers Pham

------
xaduha
That's why I hope Red/System and just Red in general takes off
[https://static.red-lang.org/red-system-specs.html](https://static.red-
lang.org/red-system-specs.html)

------
robert_foss
Fair enough.

Arrays losing dimensionality when passed through functions is a pain every now
and then.

------
grrrrrrrrrrrrr
"C retains the basic philosophy that programmers know what they are doing; it
only requires that they state their intentions explicitly."

The real 'mistake', is programmers not stating their intention explicitly.

------
flingo
Is it better to pass the length of the array, or a pointer to the last valid
address in the array? (or one past that) There's probably an advantage in the
two types being the same.

Thought of this as I was reading the article.

~~~
jibal
The length is better, since that's almost always what you want.

> There's probably an advantage in the two types being the same.

Not really.

------
rurban
The mentioned Safe C Library is now at
[https://github.com/rurban/safeclib](https://github.com/rurban/safeclib)

~~~
pjmlp
The problem with secure C11 Annex K functions, is that they are only secure in
name.

They are still as insecure as any traditional C string and memory function.

Yes, they sorted out the issues about then a string always gets its null
terminator.

However given that buffer and size are still two different parameters, the
issue of mixing up the values is still present.

~~~
rurban
Nope. The buffer size is checked at compile-time. Much like glibc fortify,
just better. There's no chance to mix them up. Even the spec'd unsafeties of
the truncating n versions are fixed.

~~~
pjmlp
Can you please explain how strcpy_s() validates that _dest_ actually points to
a memory region with enough space for _destsz_ bytes?

[https://en.cppreference.com/w/c/string/byte/strcpy](https://en.cppreference.com/w/c/string/byte/strcpy)

------
analognoise
Fat Pointers - Pascal has had them since I think the beginning?

So...30+ years later, we decide Pascal was right. Just saying. Shoutout to
FreePascal/Lazarus!

~~~
WalterBright
In Pascal, every array with a different dimension was a different type.

~~~
clouddrover
That was true in the past, but Pascal has had dynamic arrays for over twenty
years. The current versions of Free Pascal and Delphi are nice to use.

~~~
WalterBright
I kinda gave up on Pascal 35 years ago when I picked up a copy of K+R :-)

The only thing I really liked from Pascal were the nested functions, which I
put in D.

------
IshKebab
I feel like Sibiu should just write a new language that is C with fixes, and
no more.

~~~
pjmlp
Has been tried a couple of times, the problem is mostly human not technical.

------
Annatar
I don’t understand what the hoopla is about: in assembler we deal with arrays
by having to know the size of each field without giving it a second thought.
The solution is to learn assembler first, then move on to C. And AWK, as the
next generation C doesn’t have this problem, or any of the C problems.

~~~
WalterBright
I've written a lot of assembler code (including Empire in 100% assembler
[https://github.com/DigitalMars/Empire-for-
PDP-11](https://github.com/DigitalMars/Empire-for-PDP-11))

Assembler programs are very tedious to write, and so they tend to be rather
small. You don't get any help from the non-existent compiler for even simple
HLL features like static type checking.

~~~
Annatar
As a demo scene coder, I have to disagree vehemently: assembler is a joy to
write, at least on MC68000 and 6502 (Amiga and C=64). The only reason why I
don’t write code in assembler but in C on UNIX is becuase of portability. To
me, C is nothing more than a portable assembler with the extra unnecessary
code because of the inefficiencies of optimizing compilers.

------
known
Difference between Array and Linked List is enough to start confusion on
pointers

~~~
jibal
Say wut?

------
rebootthesystem
My guess is this won't be a popular post given the average age of HN
participants.

There's nothing whatsoever wrong with C. The problem are programmers who grew
up completely and utterly disconnected from the machine.

I am from that generation that actually did useful things with machine
language. I said "machine language" not "assembler". Yes, I am one of those
guys who actually programmed IMSAI era machines using toggle switches.
Thankfully not for long.

There is no such thing as an "array". That's a human construct. All you have
is some registers and a pile of memory with addresses to go store and retrieve
things from it. That's it. That is the entire reality of computing.

And so, you can choose to be a knowledgeable software developer and be keenly
aware of what the words you type on your screen actually do or you can live in
ignorance of this and perennially think things are broken.

In C you are responsible for understanding that you are not typing magical
words that solve all your problems. You are in charge. An array, as such, is
just the address of the starting point of some bunch of numbers you are
storing in a chunk of memory. Done. Period.

Past that, one can choose to understand and work with this or saddle a
language with all kinds of additional code that removes the programmer from
the responsibility of knowing what's going on at the expense of having to
execute TONS of UNNECESSARY code every single time one wants to do anything at
all. An array ceases to be a chunk-o-data and becomes that plus a bunch of
other stuff in memory which, in turn, relies on a pile of code that wraps it
into something that a programmer can use without much thought given.

This is how, for example, coding something like a Genetic Algorithm in
Objective-C can be hundreds of times slower than re-coding it in C (or C++),
where you actually have to mind what you are doing.

To me that's just laziness. Or lack of education. Or both. I have never, ever,
had any issues with magical things happening in C because, well, I understand
what it is and what it is not. Sure, yeah, I program and have programmed in
dozens of languages far more advanced than C, from C++ to APL, LISP, Python,
Objective-C and others. And I have found that C --or the language-- is never
the problem, it's the programmer that's the problem.

I wonder how much energy the world wastes because of the overhead of
"advanced" languages? There's a real cost to this in time, energy and
resources.

This reminds me of something completely unrelated to programming. On a visit
to windmills in The Netherlands we noted that there were no safety barriers to
the spinning gears within the windmill. In the US you would likely have lexan
shields protecting people and kids from sticking their hands into a gear. In
other parts of the world people are expected to be intelligent and responsible
enough to understand the danger, not do stupid things and teach their children
the same. Only one of those is a formula for breeding people who will not do
dumb things.

Stop trying to fix it. There's nothing wrong with it. Fix the software
developer.

~~~
kazinator
> _There is no such thing as an "array". That's a human construct._

Oh yeah; social construct, I would say, like gender.

> _I am from that generation that actually did useful things with machine
> language._

Unfortunately, most of them are undefined behavior in C.

> _You are in charge._

Less so than you may imagine. You're in charge as long as you follow the ISO C
standard to the letter, and deviate from it only in ways granted by the
compiler documentation (or else, careful object code inspection and testing).

~~~
rebootthesystem
This is a typical misinterpretation of the reality of programming. There is no
such thing as undefined behavior. Once you get down to bits and bytes in
memory and instructions the processor does EXACTLY what it is designed to do
and told to do by the programmer.

Despite what many might believe the universe didn't come to a halt when all we
had was C and other "primitive" languages. The world ran and runs on massive
amounts of code written in C. And any issues were due to programmers, not the
language.

In the end it all reduces down to data and code in memory. It doesn't matter
what language it is created with. Languages that are closer to the metal
require the programmer to be highly skilled and also carefully plan and
understand the code down to the machine level.

Higher level languages --say, APL, which I used professionally for about ten
years-- disconnect you from all of that. They pad the heck out of data
structures and use costly (time and space) code to access these data
structures.

Object oriented languages add yet another layer of code on top of it all.

In the end a programmer can do absolutely everything done with advanced OO
languages in assembler, or more conveniently, C. The cost is in the initial
planning and the fact that a much more knowledgeable and skilled programmer is
required in order to get close to the machine.

As an example, someone who thinks of the machine as something that can
evaluate list comprehensions in Python and use OO to access data elements has
no clue whatsoever about what and how might be happening at the memory level
with their creations. Hence code bloat and slow code.

I am not, even for a second, proposing that the world must switch to pure C.
There is justification for being lazy and using languages that operate at a
much higher level of abstraction. Like I said above, I used APL for about ten
years and it was fantastic.

My point is that blaming C for a lack of understanding or awareness of what
happens at low levels isn't very honest at all. The processor does exactly
what you, the programmer, tell it do to. Save failures (whether by design or
such things as radiation triggered) I don't know of any processor that
creatively misinterprets or modifies instructions loaded from memory,
instructions put there by a programmer through one method or another.

Stop blaming languages and become better software developers.

~~~
kazinator
> _This is a typical misinterpretation of the reality of programming. There is
> no such thing as undefined behavior. Once you get down to bits and bytes in
> memory and instructions the processor does EXACTLY what it is designed to do
> and told to do by the programmer._

Sure.

Only problem is, all you have to do is change some code generation option on
the compiler command line and millions of lines of code now produce different
instructions. Or, keep those options the same, but use a different version of
that compiler: same thing.

> _The processor does exactly what you, the programmer, tell it do to._

Well, yes; and when you're doing that through C, you're telling the processor
what to do via sort of autistic middleman.

C is not the low level; you can understand your processor on a very detailed
level and that expertise won't mean a thing if you don't understand the ways
in which you can be screwed by the C language that have nothing to do with
that processor.

I suspect that you don't know some important things about C if you think it's
just a straightforward way to instruct the processor at the low level.

> _Languages that are closer to the metal require the programmer to be highly
> skilled and also carefully plan and understand the code down to the machine
> level._

C isn't one of these languages. (At least not any more!) It's considerably far
from the metal, and requires a somewhat different set of skills than what the
assembly language coder brings to the table, yet without entirely rendering
useless what that coder _does_ bring to the table.

~~~
rebootthesystem
> all you have to do is change some code generation option on the compiler
> command line and millions of lines of code now produce different
> instructions.

It is the responsibility of a capable software engineer to KNOW these things
and NOT break code in this manner.

You are trying to blame compilers and languages for the failure of modern
software engineers to truly understand what they are doing and the machine
they are doing it on.

If you truly understand the chosen language, the compiler, the machine and
take the time to plan, guess what happens? You write excellent code that has
few, if any bugs, and everyone walks away happy.

And you sure as heck are not confused or challenged in any way by pointers. I
mean, for Picard's sake, they are just memory addresses. I'll never understand
why people get wrapped around an axle with the concept.

I wonder, when people program in, say Python, do they take the time to know
--and I mean really know-- how various data types are stored, represented and
managed in memory? My guess is that 99.999% of Python programmers have no
clue. And I might be short by a few zeros.

We've reached a moment in software engineering were people call themselves
"software engineers" and yet have no clue what the very technologies they are
using might be doing under the hood. And then, when things go wrong, they
blame the language, the compiler, the platform and the phase of the moon. They
never stop to think that it is their professional duty to KNOW these things
and KNOW how to use the tools correctly in the context of the hardware they
might be addressing.

I've also been working with programmable logic and FPGA's, well, ever since
the stuff was invented. Hardware is far less forgiving than software --and
costly. It forces one to be far more aware of, quite literally, what ever
single bit is doing and how it is being handled. One has to understand what
the funny words one types translate into at the hardware level. You have to
think hardware as you type what looks like software. You see flip-flops and
shift registers in your statements.

This is very much the way a skilled software developer used to function before
people started to pull farther and farther away from the machine. It is
undeniable that today's software is bloated and slow. Horribly so. And 100% of
that is because we've gotten lazy. Not more productive, lazy.

~~~
kazinator
> _It is the responsibility of a capable software engineer_

Nobody is saying that it's a acceptable for an engineer to screw up and then
blame it on the tools (compiler, slide rule, calculator, ...).

However, if something goes wrong in your work, it's foolish not to recognize
the role of the tools, even though it's not acceptable to blame them as a
public position.

As objective observers of a situation gone wrong in engineering, we do have
the privilege of assigning blame between people and tools. Tools are the work
of people also. The _choice_ of tools is also susceptible to criticism. We
have to be able to take an objective look at our own work.

------
okket
(2009)

See also discussion from 9 years ago:
[https://news.ycombinator.com/item?id=1014533](https://news.ycombinator.com/item?id=1014533)
(47 comments)

------
auslander
OpenBSD replaced strcat by strlcat, strcpy by strlcpy 20 years ago, in OpenBSD
2.4.

They are implemented in the C libraries for OpenBSD, FreeBSD, NetBSD, Solaris,
OS X, and QNX.

They have _not_ been included in the GNU C library used by Linux.

~~~
WalterBright
Those functions have a separate length parameter. There is no way to
mechanically check that the length argument accurately reflects the length of
the string. It's not an effective solution.

~~~
auslander
Not an expert, but shouldn't they have a length parameter, it makes sense?

~~~
acehreli
I current C, yes, they should. The whole point is, the length parameter should
not be separate from the array. It's even worse than that: the parameter is
not an "array", it's a pointer to a _single_ element. This whole thing relies
on a convention and human attention; can't work in practice.

~~~
grrrrrrrrrrrrr
And yet, it clearly does (work).

When it doesn't (work), it is NOT because of a failure in the language; It is
because C has (and always will have) the "basic philosophy that programmers
know what they are doing;".

Criticising C, is like criticising assembly. What's the point?

If people want to criticise a programming language, then they should _always_
start with C++, not C.

C++ was designed to allow us to develop bigger and more complex programs, and
yet, C++ inherited from C?

How stupid was that! But people are happy to give out various awards and
medals to the person who made one of the dumbest decisions ever made, in the
whole history of computing!

Leave C alone. It's fine. It's C++ that is the problem.

~~~
auslander
Were OpenBSD people wrong, making strlcat and strlcpy ? Honest question.

~~~
grrrrrrrrrrrrr
That is a _library_ issue.

Trying to make C 'foolproof' however, is an excercise in futility, and in any
case, can only come about by morphing it into a fundamentally different
language.

An argument in this thread, is that you shouldn't be able to pass an array
without the argument being passed having some _implicit_ 'size' element
associated with it. That is NOT C.

Conflating pointers with arrays, that _is_ C.

Again I feel the need to quote this:

"C retains the basic philosophy that programmers know what they are doing; it
only requires that they state their intentions explicitly."

If you don't know what you're doing, don't use C.

C should be considered a 'specialist' language - much like doing brain surgery
- if you're doing it, you better know what you're doing, else go be a GP or
something.

And, if you're project doesn't absolutely require that you use C, don't use
it. Instead, use something that is more 'foolproof'. (and I don't mean C++!!!)

D should focus less on being a better C, and more on being a replacement for
C++. Then, I might take D more seriously.

No attempt to morph C (i.e. the language, not the library) into something else
will ever succeed.

Leave C alone!

------
the_duke
Should have a 2009 in the title.

------
kyberias
Why does the author of D want to "fix" C by changing it into D? Concentrate on
that D language.

~~~
WalterBright
I like to fix things. Why not share a simple and effective fix?

~~~
kyberias
Isn't there a standards organization / body for C language. Have you proposed
this there? What was the outcome?

------
earenndil
> Notable among these are C++, the D programming language, and most recently,
> Go

I would remove go from that list, and add rust and zig.

~~~
dosshell
This was written in 2009, didn't rust first appeared in 2010?

~~~
lightgreen
Why was it posted now without a year in square brackets? That was misleading.

~~~
jibal
The year is in parentheses in the title, and of course it's in the article. I
for one have learned to always look at the date something was written. It irks
me that there are so many web pages with time-relevant content that contain no
date.

------
Drdrdrq
Meh. What do you do with the dynamically allocated arrays then? Do you pass
their dimensions alongside pointer? If that bothers you so much, you can
create a struct that holds the pointer and metadata, and do the checks
yourself. Calling this "C's biggest mistake" is a bit sensationalistic.

EDIT: besides, you should start new projects in Rust anyway, because it takes
security to whole other level. C did a great job, but it's a bit old. :)

~~~
PeCaN
>besides, you should start new projects in Rust anyway

Thanks, I almost forgot what website I was on for a second.

~~~
Drdrdrq
No problem - happened to me 6 hours ago too. </s>

