
C’s Biggest Mistake (2009) - todsacerdoti
https://digitalmars.com/articles/C-biggest-mistake.html
======
WalterBright
Author here. I'll be blunt and repeat a prediction I made 3 years ago or so:

C is finished if it doesn't address the buffer overflow problem, and this
proposal is a simple, easy, backwards compatible way to do it. It is simply
too expensive to deal with buffer overflow bugs anymore.

This one addition will revolutionize C programming like adding function
prototypes did.

~~~
baby
I don't see a future where C survives, not only because of memory corruption
bugs (although that's a pretty big one), but also for usability: the lack of
package manager, common build system, good documentation, good standard
library, etc. are just too much to compete with any modern system language.

~~~
blogant
> lack of package manager, common build system, good documentation.

This is where C is superior to virtually every other language. It has K&R to
start with [1], a wealth of examples to progress from there, man pages,
autotools, cmake, static and shared libraries.

> good standard library.

It should have hash tables at least, but it isn't bad.

[1] Which is still the best language book ever written (yes, it has some anti
patterns, you unlearn them quickly).

~~~
josephg
Huh? In what way is C’s books, documentation or build system superior to that
found in other languages? Most languages have plenty of good books written
about them. And plenty of code examples online. I can’t speak for other
languages but I find MDN (Javascript) and the rust docs consistently better
than C’s man pages. Ruby’s documentation is great too.

As for build systems, autotools is a hilarious clown car of a disaster. You
write a script (automake) to generate a huge, slow script (configure) to
generate a makefile to finally invoke gcc? It is shockingly convoluted. It
seems more like a code generation art project than something people should
use. CMake papers over it about as well as it can, but I think cmake is (maybe
by necessity) more complex than some other entire programming languages. In
comparison, in rust “cargo build” will build my project correctly on any
platform, any time, with usually no effort on my part beyond writing the names
and versions of my dependencies in a file.

And as for package management, C is stuck in the 80s. It limps by, but it
doesn’t have a package manager as we know them today. There’s no cargo, gems,
npm, etc equivalent. Apt is no solution if you want your software to work on
multiple distros (which all have their own ideas about versioning). Let alone
writing software that builds on windows, Mac and Linux.

So no, C is not superior to other modern languages in its docs, build system
or package manager. It is vastly inferior. I still love it. But we’ve gotten
much, much better at making tooling in the last few decades. And sadly that
innovation hasn’t been ported back to C.

~~~
kps
> _in rust “cargo build” will build my project correctly on any platform_

I've got a parts drawer full of controllers that says it won't.

~~~
estebank
You are conflating consistency of package management and library behavior
across platforms with platform support.

When it comes to microcontrollers rust is currently at the mercy of LLVM
support and vendors.

------
dpc_pw
I love how so many people here argue with _the Walter Bright_ about technical
aspects of C.

I have been a member of many programming languages communities, and every
language has its own culture. C was always a language for the arrogant. "The
real programmers" that can handle their memory, not afraid to work with
pointers and that can get their code right.

I've been there, done that for many years, and became more humble with time.
In a way I still love the brutal simplicity and low-level nature of C, but I
would use it only if absolutely can't use any other language for technical
reasons, and I would be really, really cautious.

~~~
moon4u
> Oh, how they dare argue with WALTER BRIGHT. The hubris!

With all due respect, but if someone says something invalid, the fact that
they have authority on a subject does not mean that we should agree.

As far as I understand the article (And I'm not the great Walter Bright, so I
may be wrong) - the author states that "void foo(char a[..])" is better syntax
than "void foo(size_t s, char a[])" but does not provide any arguments for it.
Furthermore, the author initially fails to mention that there has been an
attempt to fix the array-to-pointer-decay issue, when discussing "C's Biggest
Mistake".

So, yeah, the author may be right that this has been C's biggest mistake. I
don't know whether that is true or not, I do not have his experience. It is
certainly true that this mistake would be high on rankings of all mistakes
that C did. Still, the initial "sleight of hand" move followed by
unsubstantiated argument leads to a post with the quality similar to that of a
twitter post. Maybe even worse, since, you know, it's posted on a place other
than twitter, so we are actually talking about it as if it was something
serious.

------
SamReidHughes
The biggest mistake to me feels like implicit integer conversions. That's
where C feels like it's really out to get you.

~~~
leetcrew
on a somewhat related note, I've always wished for something like `explicit`
that prevents assigning different typedefs for the same underlying type to
each other. like suppose I have two types, WorldVec (vector in worldspace) and
ViewVec (vector in view/sceenspace). under the hood they are both typedefs for
float[3], so I can freely assign them back and forth. but any vector operation
that mixes the types would almost always be a bug, since they are in different
spaces. would be cool to get this functionality out of the humble typedef.

~~~
Reelin
This has always bugged me as well. I've generally solved this by wrapping
things in a struct. Type checking will use the (incompatible) wrappers and a
modern compiler should optimize them away. To avoid strict aliasing violations
when converting between equivalent wrapped types you can use a union and
employ a function to hide the verbosity.

I have no idea if this is the "right" way to do things, but it seems to work.

~~~
alternatetwo
That's what Microsoft did from some version on in their build tools, all the
HANDLE's etc used to just be typedef void _, now they 're a dummy struct each
(HANDLE__ _). Seems to be a good solution.

~~~
pjmlp
That was already opt-in back in the Windows 3.1 days, but few bothered to use
it as such.

Back when I was doing pure Windows C for a while, this helped quite a bit,

Basically you had to #include<windowsx.h> and define the _STRICT_ macro.

There were also several utilities that made it much easier to deal with
events, dialogs and callbacks.

[https://docs.microsoft.com/en-
us/windows/win32/api/windowsx/](https://docs.microsoft.com/en-
us/windows/win32/api/windowsx/)

[https://jeffpar.github.io/kbarchive/kb/083/Q83456/](https://jeffpar.github.io/kbarchive/kb/083/Q83456/)

I got to learn it via the "Programmer's introduction to Windows 3.1" book,

[https://archive.org/details/programmersintro00myer](https://archive.org/details/programmersintro00myer)

------
pritovido
I actually believe the way C works is great, as simple as it could be and it
gives you power to do exactly what you want, they way you want it.

I would certainly hate(and would continue using them btw) that they remove
normal pointers from C. It would be like removing s expressions from lisp.

I believe that the solution to this "mistake" is just not using c directly,
using other languages to write C code for you, or use c primitives that are
100% well tested.

That is what we do, our c primitives-libraries-modules are written and tested
by lisp and our own language.

It it then very easy to use that code in python or c++, swift or whatever as
libraries or modules.

~~~
AnimalMuppet
My solution is different: Don't use dynamically-allocated variable-sized
buffers unless you have to.

C++ gives you more tools for avoiding them than C, BTW.

~~~
MaxBarraclough
> C++ gives you more tools for avoiding them

A direct link, for the curious:
[https://en.cppreference.com/w/cpp/container/array](https://en.cppreference.com/w/cpp/container/array)

As an aside, the _C’s Biggest Mistake_ article cropped up in HN discussion 8
days ago,
[https://news.ycombinator.com/item?id=24373728](https://news.ycombinator.com/item?id=24373728)

~~~
saagarjha
C++ also lets you grab an array’s size via templates, FWIW.

~~~
MaxBarraclough
Do you mean _std::extent_? [0] You can do the same in C if you define a macro
that uses _sizeof_ [1]

This doesn't diminish the advantage of _std::array_ , though, as it embeds the
size of the array into the object, unlike when a raw array is passed and
'decays' to a pointer.

[0]
[https://en.cppreference.com/w/cpp/types/extent](https://en.cppreference.com/w/cpp/types/extent)

[1]
[https://stackoverflow.com/a/4415646/](https://stackoverflow.com/a/4415646/)

~~~
saagarjha
I mean like you can pass it through a function by using the “array syntax”
when defining the parameter and making the size a template parameter. Like so:

    
    
      template <size_t N>
      void foo(int bar[N])
    

And this gives you the size without an additional size parameter as you’d
usually need in C (of course with the limitation that the parameter now has to
be a compile-time sized array).

------
ktpsns
I don't think it is a mistake in language design. In the 90s, memory was a
rare good, and it still is in the microprocessor world, where "only" a few
kilobytes of RAM are available. There are performance critical paths where
passing a size_t is just unnecessary.

The actual mistake is to don't pass size_t as a user. This is one kind of
"premature optimization". We can safely say the language design doesn't
encourage the user to write safe code, and succeror languages do that.

Don't get me wrong — I just try to do the point that C itself is not the point
to blame. It's people using computers who write the million dollar bugs.

~~~
WalterBright
The #1 undetected bug problem with C programs is buffer overflows. Experience
shows it is extremely difficult to verify that arbitrary C code doesn't have
buffer overflows in it. Assistance from the core language design can improve
things a great deal.

D allows passing both raw pointers as parameters and pointer/length pairs.
It's up to the user to choose. In practice, people have simply moved away from
using raw pointers into buffers.

As for performance, in C to determine the length of a string one uses
strlen(). Over and over and over again on the same string. This can be a major
performance problem, even not considering the memory cache effects. When I
look at speeding up C code, often the first nuggets of gold is reviewing all
the explicit and implicit uses of strlen(). (Implicit uses are functions like
strcat()). It's also the first place I look for bugs when reviewing C code -
anytime you see a sequence of strlen, strcat, strcpy, it's often broken
(typically in neglecting somewhere to account for the extra 0 byte).

~~~
Gibbon1
All of this I agree with. In a better world 'arrays' would have added in the
1980's. The arguments about memory limitations is spurious since if you're
writing good code you always pass a pointer and the length. Always no
exceptions.

Yeah and all the string functions should have been marked as depreciated with
C89 and fully depreciated with C99.

~~~
mark-r
Yes, you can always pass a pointer and a length explicitly. And that's what
the "safe" versions of e.g. string functions do. But it's still incumbent on
you as the programmer to use them properly. It would still be beneficial to
have a compiler mode where all that was done for you automatically and it was
impossible to have a buffer overrun.

~~~
Gibbon1
The safe string functions aren't safe because they don't return safe strings.

------
david2ndaccount
In C you can declare pointers to arrays, the syntax is just somewhat strange.
You can even declare it as a pointer to a variable sized array with c99, eg:

    
    
      void foo(size_t length, char (*x)[length]){
          size_t size = sizeof(*x);
          assert(size == length);
          printf("sizeof(*x): %zu\n", sizeof(*x));
      }

~~~
dependenttypes
Here is a post from 2014 from someone who is in the C standard committee.
[https://gustedt.wordpress.com/2014/09/08/dont-use-fake-
matri...](https://gustedt.wordpress.com/2014/09/08/dont-use-fake-matrices/)

------
ScottBurson
Yes. As I've long said, _C does not have arrays_. It has pointers and
notations for initializing memory, but it does not have arrays.

There are actually two problems here. One is the absence of bounds checking.
The other, which is related but technically orthogonal, is the hole in the
type system: an array is not an object. It's been true since Unix v7 that you
can pass or return a struct by value, but you can't pass an array by value
unless you wrap it in a struct.

The type system also makes no distinction between a pointer to a single object
and a pointer into an array of objects. I've worked on static analysis tools
that try to find potential buffer overflows, and this turns out to be a
surprisingly big problem. One has to do a global dataflow analysis just to
discover which pointer variables could ever point at array elements.

~~~
hvdijk
It would help if you define what you mean by "object". This term has a
definition in the C standard by which arrays are unquestionably objects. It is
true that you cannot pass or return them by value, but that does not mean they
are not objects, that means the ability to pass or return things by value is
not a property of objects.

About the singular vs array question: this is true but just a special case of
the absence of bounds checking, is it not? C's approach of allowing e.g. "int
i;" to be addressed as if it were an array of length 1, allowing construction
not just of &i but also &i+1 as pointer values, is valid and sometimes useful,
but you have to make sure you never access *(&i+1). That's the same problem as
how given "int a[2];", accessing a[2] is not valid, as far as I can see.

~~~
jstimpfle
What GP means is "there are no array expressions in C's syntax". You can't
copy them or assign them, and I'm not sure that they are part of any kind of
"type system" in C (although some compilers have a notion of array type
internally, which is obvious from their error messages).

~~~
hvdijk
That's just not true, there are array expressions, and arrays are part of the
type system. I give him more credit than that. (The "you can't copy them" is
technically untrue as well but that is just because of sloppy wording.)

~~~
jstimpfle
You're right, I forgot about compound literals, which were introduced in C99
and which are a rarely used feature.

But I guess my statement that "C doesn't have array expressions" was true
before the advent of C99. And that's also why array decay made even more sense
back then. (It still makes a lot of sense today IMO).

~~~
hvdijk
I didn't mean compound literals, I do not really see how they change things
here, I meant that there are a few cases where arrays don't decay to pointers,
and supporting those requires compilers to make arrays a part of the type
system. Example: given "int a[3];", how else would you compute (&a+1) ?

~~~
jstimpfle
To me, the question is what is actually "the type system" and what is "the
allocation system" or whatever else aspect of implementing a compiler.

So determining the type of "&a" is not an issue, it's just one case in
determining the type of a C expression (look up the object "a", is it an
array? The type of the expression is a pointer to the array element type).

This is not a _special_ case, at least not more special than how to determine
the type of the expression "a", or "1".

No array type needed.

~~~
hvdijk
Are you aware that given int a[3];, (&a+1) and (a+3) denote the same address?
If you are, how can that possibly work if as you suggest, &a and a are
indistinguishable to the compiler, that they are both seen as a pointer to
int?

~~~
jstimpfle
That came quite honestly as a huge surprise to me. I've never seen this
before. Thanks for the heads up!

------
pvg
Previously:

[https://news.ycombinator.com/item?id=17585357](https://news.ycombinator.com/item?id=17585357)

[https://news.ycombinator.com/item?id=1014533](https://news.ycombinator.com/item?id=1014533)

------
franciscop
This was probably the most confusing thing about C when I first started
learning programming back in the day. When you call a function you pass the
value, except in arrays where it gets converted as a pointer. It was explained
back then to me that the reason is because copying the whole array was not
efficient so it was better to pass the reference.

~~~
quelsolaar
I think a better way to think about is to say that when you type:

int a[10];

you allocate 10 integers and "a" is the pointer to the first one of them.

Arrays are just memory, just like what you get wen calling malloc, and memory
is accessed using pointers in C.

~~~
Cyph0n
Minor nitpick: there is no allocation going on here - you’re simply reserving
a fixed-size buffer on the stack (assuming the array is local to a function).

~~~
quelsolaar
I would call that a stack allocation, but yes they are slightly different. In
my mind its a feature that arrays allocated on the stack and heap can
interchangeably be given as an argument to a function.

------
makecheck
Unfortunately working with dynamically-allocated buffers is still a thing;
adding array syntax just favors one form of size specification without solving
the other case.

Although C is usable on many types of hardware, interesting things could be
done, e.g. on desktop OSes if certain hardware-specific extensions were made.

One of the things I wish we could do with our now-absurdly-large pointers
(64-bit) is to reserve a handful of bits for other information such as the
size. Sure it means we can’t store anything at location 2^64-1 but it wasn’t
that long ago we only had 32-bit pointers and the 33rd bit is _twice as many
addresses_ all by itself so I think we can lose a few.

For example, if all allocations were rounded up to buckets of a certain size,
the precise byte count would not need to be encoded in the pointer (just the
number of buckets, requiring fewer bits). There could be a couple bits to give
pointers a type for other interesting scenarios, e.g. perhaps a pointer
identified as an “immediate value” that isn’t actually allocated at all, and
it is “dereferenced” by treating its “address” as the “stored” value. There
could even be a couple of bits to track use of common allocators (it would be
_so_ nice to simply know that a pointer was allocated by "malloc" vs. "new"
for example).

In high-level languages, then, the syntax change would be not to identify
arrays specifically but pointers with encodings that are “complete” (e.g.
"char const complete*" or something), covering both stack arrays and dynamic
buffers.

~~~
WalterBright
> Unfortunately working with dynamically-allocated buffers is still a thing;
> adding array syntax just favors one form of size specification without
> solving the other case.

All that is needed is a mechanism for forming a fat pointer from a pointer and
a length. In D this looks like:

    
    
        int* p = cast(int*)malloc(length * sizeof(int));
        if (!p) fatalError();
        int[] a = p[0 .. length];
        ...
        int x = a[length + 1]; // runtime error: buffer overflow
    

In C, this could be done via a macro with no additional core language changes.

~~~
akira2501
Tongue in cheek: I already have a fat-pointer:

    
    
        struct foo {
            int a[10];
        } f;
    
        func(&f);

------
mitchs
Something I see as wrong with C outside of the context of the Linux kernel is
mostly something wrong with us the developers. We are far too content to live
in filth.

In addition to unsafe/irregular buffer handling, I also constantly see poor
data structure choice, presumably due to a lack of default choice of library.
It is very common to see code scanning linked lists when they should be doing
map look ups. (And often even the linked list operations are ad-hoc and
repeated for every type of struct with an embedded next/prev pointer.)
Everyone always defaults to linked listing it up because they never have to
pay the up-front cost of finding a library or investing in re-inventing the
wheel. I think this is also why you see so much sketchy buffer code - no one
has bothered investing in safer buffer abstractions.

Perhaps some of this is caused by the difficulty of taking on dependencies in
a portable way. (CMake/Autotools can make this better, but it is a far cry
from NPM.)

------
kens
C's pointers wouldn't be an issue if the world had used the Intel iAPX 432
processor instead of the 8086. The iAPX 432 included bounds checking for every
array in hardware (among many other features), so it was impossible to make an
out-of-bounds access.

Unfortunately the iPAX 432 was delayed, so Intel introduced the 8086 as a
stopgap processor and computers have been using the x86 architecture ever
since. It's interesting to think that if history had gone a bit differently,
whole classes of security problems would not exist.

[https://en.wikipedia.org/wiki/Intel_iAPX_432](https://en.wikipedia.org/wiki/Intel_iAPX_432)

~~~
james412
C-the-language has no concept of size-tagged arrays at runtime, and I guess
it's baked in deeply due to the various guarantees made about sizeof(array),
&array[0], and ability to cast &array[0] back to the original array. The iAPX
hardware would have gone unused

~~~
pjmlp
Solaris uses SPARC ADI to great success, iPhone X has pointer validation, and
Android 11 has made the work to require ARM MTE in future releases.

~~~
james412
Pointer tagging is a completely different tech that requires no runtime
knowledge of the length of an array

------
chadcmulligan
Niklaus Wirth would concur - that was similar to his argument for pascal -
strings contain a size.

~~~
kps
Wirth's Pascal didn't have a string type. You could have a fixed-length arrty
of CHAR, but you couldn't have _fewer_ than 16 characters in a 16-character
array, and you couldn't pass a 16-character-array to a function with a
256-character-array parameter. Only the magic ‘functions’ built in to the
language like write() could accept strings of different lengths. Since this
made the language worse than FORTRAN and handling text, most implementations
added some sort of string handling.

~~~
pjmlp
Including the 2nd revision of Pascal, ISO Extended Pascal.

------
sys_64738
I’ve been writing C code for 30 years. What I like about it is code from back
then is pretty much similar to what you’d right today so there’s limited need
to chase the train. Sure it has security issues but if you don things ‘right’
it’s still untouchable, imo. Most other languages are heavy and slow compared
to C. A lot are not even backwardly compatible (python) or suffer from being
kitchen sinks (C++). My own opinion is that python is the only other ‘must
know’ language as you can do RAD. For speed you resort to C. There are so many
toy languages in between with some nice features (coroutines, native JSOn
support, etc). But lots are just meh. All personal opinions.

~~~
monetus
Have you ever tried nim?

~~~
andi999
Does nim have numpy/scipy/matplotliv equivalents?

------
rurban
I agree that missing bounds checks are the biggest problem, with the unhealthy
attitude of dismissing the Annex K for political reasons. My Safe C library
has only adoption with the big players.

But even weirder is the total lack of a proper string library. Nobody but
Microsoft uses whar, and they are the only ones with the proper whar_t size.
Everybody else was wrong with size 4. But nowadays it should be clear that
only u8 is the only way forward, C++ even adopted now char8_t for it. But they
all still ignore the unicode problems with an overly simplistic, glorified
null-terminated memory buffer library. These are not strings anymore nowadays.
Strings have multiple representations of characters in unicode, strings need
the unicode version to be exposed which changes every year. They need a proper
fold case and norm API, otherwise you cannot compare them, so you cannot
search for strings. Grep would be happy to find unicode strings, but it still
cannot. coreutils still cannot do unicode in 2020.

Also the complete lack of security, esp with names, ie identifiers. Such as
pathnames. Most filesystems just ignore security, spoofing, bidi changes,
mixed scripts as if this problem does not exist at all. Strings are not
normalized, not properly fold cased.

The _l locale mess, it still relies on global runtime state, which is not
compile-time optimizable, in opposition to _l or simply just a new u8 API. Not
reentrant. Not compile-time optimizable. It's a huge mess.

gcc cannot do compile-time constexprs checks, only clang can, leading to up to
200x faster libc code. gcc cannot do user-defined warnings of errors.

glibc, FreeBSD libc, musl, none of it fixes anything.

------
GuB-42
I don't see a mistake here, certainly not a "biggest mistake".

This is C, not C++. Keep it simple.

Here, the idea is that there is no special type for "pointer+size" (what the
author proposes as an array). Ok, let's add one and see the implications.

\- How do I get the size, the number of elements? A "sizeof" like operator?

\- Can I resize the array? If yes, how? If no, why?

\- What happens if I overflow? Undefined behavior?

\- A memcpy-like would be an obvious function to implement, what happens if
sizes differ?

\- What is the relationship between static arrays (ex: int a[5]) and
"pointer+size" arrays? Are these completely different types? Is there an
implicit cast between the two?

\- About casting, how can I go from a separate pointer and size to an array
and vice versa? If it is possible at all.

\- What if I do a bit of pointer magic to access the internal representation
of the array? Probably undefined behavior.

It is much more complex than "just add array[..]", I expect more tradeoffs,
more undefined behaviors (C wouldn't be C without them). Adding complexity to
the language can actually make things worse.

As for zero-terminated strings, they have advantages and drawbacks. They are
preferred by the C library, but you can do pointer+size if you want by using
mem* instead of str* , or %.*s instead of %s in printf (not sure about this
one).

~~~
dnautics
Ziglang is basically this plus getting rid of the preprocessor and almost all
UB. It's extremely clean, quite safe, and in many ways far far simpler than C.

Here are it's answers:

\- How do I get the size, the number of elements?

Builtin .len field operator. @sizeOf works too.

\- Can I resize the array? If yes, how? If no, why?

No. Array lengths are comptime known; there is something called a slice which
is runtime known and bounds checked in safe releases.

\- What happens if I overflow? Undefined behavior?

Panic, in safe releases. UB in dangerous releases (small or fast)

\- A memcpy-like would be an obvious function to implement, what happens if
sizes differ?

Bounds checked at runtime for safe releases.

\- What is the relationship between static arrays (ex: int a[5]) and
"pointer+size" arrays? Are these completely different types? Is there an
implicit cast between the two?

Yes, and yes. Arrays can implicitly be converted to slices at compile time;
slicing into a slice with compile-time known index width yields an array.

\- About casting, how can I go from a separate pointer and size to an array
and vice versa? If it is possible at all.

There is an escape hatch function for this.

\- What if I do a bit of pointer magic to access the internal representation
of the array? Probably undefined behavior.

It's defined, but unsafe.

Do I haven't really worked too much in zig (it's not my daily driver) but I
think it says something that all of these questions have answers to them, they
are sensible, and very easy to remember.

------
Taniwha
It's probably worth remembering the history of C - it didn't appear fully
formed, lots of stuff evolved as people used it - for example in V6 Unix +=
used to be =+.

In particular you used to use structures in a weird way, essentially field
names within structures lived in their own global name space, any pointer
could be used with any structure - there weren't unions yet so this was used
for good effect in Unix kernel drivers (there was a standard buffer queue
header you could add your own stuff at the end of.

I think this was kind of descended from the BCPL/Bliss world view where
explicit pointers were a relatively new thing in languages and their typing
was pretty simple (there was a limit to the number of indirections allowed) -
fully orthogonal typing systems were only just becoming a thing then.

Also I suspect that the idea that a[i] was the same as a+i was an idea with
legs, this is still legal C:

    
    
      char *x()
      {
            int i; char *p;
            return &i[p];
      }

------
Something1234
Stupid question, but how do I access the size of an array using this fancy new
declaration if it were to be added? It doesn't seem like any sugar is there to
provide "range based for loops."

Wait I would just use `sizeof` but then I'm still doing pointer math then?

~~~
WalterBright
A macro can be added to access the length property.

------
ummonk
The C standard doesn’t forbid fat pointers. You could have a compiler that
implements fat pointers (and crashes on out of bounds access attempts) without
violating the standards in any way, since our of bounds accesses are undefined
behavior.

------
Egoist
To me, i think C is a powerful language that is weakened by 2 things:

1- Trying to find the proper style and methods to write few lines of code. The
reason for this is because C is an old language that kept changing. Thus, you
can read a book, yet find someone to tell you “you shouldn’t do it that way”.

2- compilers made C into different flavors. Microsoft C compiler provides
scanf_s with the old scanf being deprecated. In the other hand, gcc has
different approaches without the scanf_s that Microsoft has. This can be so
annoying to use.

------
thayne
The proposed change has another important aspect to it: it would help
standardize a way of passing arrays across language boundaries. Currently when
using FFI, you often have to decompose fat pointers into pointer and length
before calling a foreign function, or compose a fat pointer from a pointer and
length in extern functions. This could mean languages like rust, d, etc. Could
just pass arrays directly.

I for one would love to see this proposal become a reality.

------
quelsolaar
Personally I like it the way it is. If you want to copy an array when making a
function call you can define a struct with a array in it, and pas the
structure.

If C did pass array lengths it still wouldn't matter since C doesn't (and in
my opinion shouldn't) check for overflows.

~~~
dependenttypes
> since C doesn't ... check for overflows

Because it's a language, not an implementation. An implementation is free to
do so (and there are such implementations after all).

~~~
quelsolaar
That's correct! C doesn't require checking for overflows, but it also doesn't
forbid implementations from doing so. both are features.

~~~
Koshkin
I don’t think it is possible, not without changing some parts of the C’s
specification. At the very least you’d need to be able to somehow encode the
length of the buffer in the pointer to it. (There is no semantic difference
between a pointer to a simple, fixed-length variable and a pointer to an
array.)

~~~
dependenttypes
void f(size_t n, int v[n]) is valid C.

~~~
Koshkin
Semantically though, the second argument is still just a naked pointer.

~~~
david2ndaccount
That’s why you use the proper declaration of

    
    
      void f(size_t n, int (*v)[n])
    
    

instead.

------
hvasilev
That is "modern" C++ is frequently being used as "C with classes and basic
safety".

I wonder if a better C can be made by just stripping-down the bloated C++ and
introducing the "unsafe" keyword for dangerous features like directly using
arrays, etc.

------
harry8

        typedef struct fat_ptr_t {
            size_t size;
            void * start;
        } fat_ptr_t;
    
        extern void foo(fat_ptr_t a);
    

if it's a good idea to use a fat pointer, why do we need new syntax to sell
it? What am I missing here?

~~~
EdSchouten
> What am I missing here?

\- A concise syntax for declaring, accessing and mutating them. Dealing with
lists is such a common thing in programming languages, that it's simply crazy
to not have a proper syntax for them.

\- Generics/templating, so that you can use concrete types instead of 'void
*'. Having that prevents mistakes and also tends to make code self-
documenting.

"If it's such a good idea to use a safety belt and airbags, why do we need
special devices for it? Why can't I just use a piece of rope I had in a drawer
and some leftover balloons from my previous birthday party?"

~~~
harry8
There are lots of places you can make C's syntax more concise. What do you get
for changing the syntax here? Why is it worth it?

void* was merely for example. Yes make it typed when you use it in your C
code. Also used accessor functions to wrap array index so you can switch on &
off a macro for bounds checking, absolutely do that. Does new syntax change
anything if you do these things?

Don't much care for the seatbelt analogy there, remove your working, properly
fitted seatbelts with our red ones because they're easier to see? These kinds
of analogies always break down. Especially car analogies for programming and
yes, I use them too.

------
skywhopper
It’s not a “mistake”. This article is complaining about a misinterpretation of
C’s functionality. Arrays are not “real” data structures in C: there’s no such
thing. The array-ish syntax that’s available is just a some syntactic sugar on
top of pointers. You could say that having the sugar at all is a mistake. Or
that C is incomplete without first-class array types. This is a cute hack, but
at this point (far more than 10 years ago) it’s probably better to move on to
Rust if you don’t like this aspect of C, rather than proposing to hack the
language.

~~~
WalterBright
> Arrays are not “real” data structures in C: there’s no such thing.

I assure you, there are arrays in C.

    
    
        int a[100]; // `a` is an array, not a pointer
        int* p;     // `p` is a pointer, not an array
    
        a = p;      // error, array is not a pointer
    

> The array-ish syntax that’s available is just a some syntactic sugar on top
> of pointers.

Sorry, this is incorrect. In some circumstances, C will implicitly convert an
array to a pointer, which is what the article is about, but don't mistake a
conversion with identity.

------
juped
You can if you make them fixed-size arrays inside a struct. I agree with the
thesis of the article.

------
justinator
I remember my, "Modern C as an Oxymoron" joke went over pretty badly.
Buuuuuuuuut...

------
andrepd
The "fat pointer" is the std::array in C++.

------
nairboon
Have you ever proposed a paper for this to WG 14?

------
peter_d_sherman
>"What mistake has caused more grief, more bugs, more workarounds, more
endless hours consumed, etc., than any other? Many people would say null
pointers. I don’t agree.

 _Conflating pointers with arrays._

I don’t mean them using the same syntax, or the implicit conversion of arrays
to pointers. I mean the inability to pass an array to a function as an array,
even if it is declared to be an array. C will silently convert the array to be
a pointer, and will rewrite the function declaration so it is semantically a
pointer:

[...]

This seemingly innocuous convenience feature is the root of endless evil. It
means that once arrays leave the scope in which they are defined, they become
pointers, and lose the information which gives the extent of the array — the
array dimension. What are the consequences of losing this information?

An alternative must be used.

For strings, _it’s the whole reason for the 0 terminator_.

For other arrays, it is inferred programmatically from the context. Naturally,
every situation is different, and so an endless array (!) of bugs ensues.

The trainwreck just unfolds in slow motion from there.

The galaxy of C string functions, from the unsafe strcpy() to sprintf()
onwards, is a direct result. There are various attempts at fixing this, such
as the Safe C Library. Then there are all the buffer overflows, because
functions handed a pointer have no idea what the limits are, and no array
bounds checking is possible."

PDS: The root of all of this -- is that C, being a low-level, close-to-the-
hardware, designed in the 1970's programming language (some in academia
pejoratively call it a "glorified assembler"), was not designed with a proper
string storage class as we know them in programming languages today; instead,
arrays of characters were substituted for this purpose, and those arrays were
not implemented containing total size (length) and dimensionality information.

Basically an array in C -- is a set of contiguous memory, which has a starting
address (the pointer passed), and a stated element size that the compiler
knows about, but not the length (aka, total size, element count, etc.) of that
array, nor its dimensionality.

Observation: C's arrays need length information signalled in an out-of-band
fashion (that is, this information cannot exist as a zero (0) -- somewhere in
the array).

The irony of all of this is that C was invented at AT&T, and AT&T for the
longest time had difficulty with phreakers exploiting 2600hz signals to gain
access to its long distance trunk lines, from which they could call to
anywhere in AT&T's system for free.

But, that's what the engineering error of in-band signaling generates...

C, by using arrays to implement strings, and letting the zero terminator
(information about string length) exist in the memory space of the string,
made exactly the same engineering mistake -- just in software -- and that is
the mistake of in-band signaling.

Now, that being said, hindsight is 2020, and it couldn't be expected that
Dennis Ritchie, who invented C in the 1970's would have foreseen the
consequences of that engineering "mistake" (AKA, "act which generated quite
the _education_ for a future populace". <g>).

Such is the price of being an innovator.

On the one hand, he advanced computer technology greatly -- far beyond the
technology advancements created by most of his contemporaries of his day...

On the other, that advancement gave us this highly educational engineering
"mistake" \-- that we can all learn from!

Such is the price of being an innovator -- and pressing the "bleeding edge" of
what is possible...

Humanity could not be advanced without such innovators, and the occasional
future flaws (and the wisdom that comes from examining them in hindsight!)
that their innovations generate...

------
tus88
> Conflating pointers with arrays.

AND Strings.

FTFY.

~~~
yarrel
C doesn't have strings. ;-)

~~~
Snarwin
More precisely, it doesn't have a string _type_.

~~~
Skunkleton
It certainly has string literals, which are a kind of type.

~~~
mark-r
That's the thing, they're not a type - they're just an array of char with an
unspecified length, indistinguishable from any other array of char or pointer
to char. Only the convention of ending them with a null character makes them
usable at all.

~~~
Skunkleton
printf("%lu\n", sizeof("Well, actshully")); -> 16

~~~
rightbyte
One could say that C has a different type for each string size plus the joker
pointer to char.

