I started coding in the 70s when C first came out. Conflating arrays and pointers was, to me at the time, a huge win. I agree it was a mistake in the long run, but given the state of the art at the time, it gave so much power to the high level programmer that up to that point only assembly level developers had. I was able to move code that up to then had been coded in assembly to C. That's the benefit that people today miss.
Getting into rust recently I think it's not to be undervalued how much a good built-in library has to say. In C there is so much coding just for small things that you take for granted coming from higher level languages. In rust, it's there.
Not sure why you got downvoted for speaking the facts. Arguably the needs of the standard library of a language designed in the 70s compared to today are very different.
That said, maybe new C standard should come up with all-batteries included standard library upgrades. On second thought, imagine the compilers snafu.
As someone who works in C, C actually needs less in the standard library. What it actually needs is a proper import/module/build system to make it easier to stitch code and librarys together. The prevalence of header-only librarys attests to what a total mess it all is.
A proper import system would massively reduce the amount of wheel reinventing because then it would not be fiendishly difficult to actually reuse code in a coherent way. Only after that does it make sense to start quibbling about what is included where and by default.
I'll second this. The biggest thing I think C needs (besides what the author of the linked article is saying) is proper namespaces and modules. It holds the language back so much.
Tagged union types would also be really, really nice.
> Arguably the needs of the standard library of a language designed in the 70s compared to today are very different.
C supports modularity since the 70s. You do not need a standard library with everything and the kitchen sink. The standard library is not the only component you're able to consume.
If there's a lesson from high level languages such as python and node.js, it's that being able to consume third party libraries is all anyone needs to get stuff done, and standard libraries bring convenience to the table but they aren't critical to get stuff done.
I think today we understand more about what a language needs to provide to enable modularity. One issue that comes up over and over with interoperating libraries is confusion around ownership. If you pass a char* to me, do I own it now? Is it my responsibility to free it? Or is it just a short-lived reference to data that's still yours? Garbage collected languages can kinda sorta sidestep these questions, at least with memory-use-after-free, and they're more likely to tolerate the costs of immutable data structures and defensive copies. But in C this question is critical, and we screw it up constantly, especially at the interface boundary between libraries. C++11 does much better, with first class support for moving ownership through a function call, and Rust goes further with lifetimes and the borrow checker.
Another point that's kind of in the weeds is the atomic memory model, which C didn't have until 2011 (and I think C++ gets most of the credit here for formalizing it). I'm still waiting for proper support for C11 atomics in MSVC. Multithreading has been ubiquitous for a while, and while it's possible for an application to decide that everything's going to be single-threaded, that's not an option for a portable library. Without standard atomics, you have to resort to platform-specific extensions to do basic stuff like initializing globals.
Passing array as a pointer is definitely a mistake. Not only is the size of the container lost, dimensionality of the array is lost as well. That a pointer to a pointer is syntactically the same as a pointer to a 2d array leads to a great deal of confusion in any codebase with a modicum of complexity.
But likely neither of these is as big as letting void * represent any arbitrary construct, be it pointer or data structure or the universe itself. Finding references to void * in a codebase is akin to crossing the event horizon of a black hole.
"No proper arrays/slices" is definitely up there, but I'd nominate "no dedicated error handling" for the #1 spot, by virtue of how much trouble it causes compared to how easily it could be fixed. Go has a reasonable and remarkably lightweight approach here: add just enough of a tuple types feature to support multiple return values, and then make the rightmost return value an error flag by convention. Easy! No more "zero indicates success unless the return type is a pointer" nonsense. Bonus points for something like `error` that gives tooling a clue, but even that we could live without.
> Go has a reasonable and remarkably lightweight approach here
Go literally hardcoded the same error-prone errno-like error handling, it just has some syntax sugar now. It still doesn’t compose, is unreadable and gives only the impression of properly handled error cases. It even fails to return a proper sum type, so now you get a possible important value and an error value as well.
Yeah exactly, there's like zero difference to the compiler. That's why it's such an easy feature. (To be fair you probably also want some syntactic niceties like destructuring bindings.) But clearly in practice, no one wants to define a new named struct type for every function that needs to return an error. I think part of the pain is that, without some sort of "auto" type deduction feature, you force the caller to type out the name of the return type most of the time, in addition to the name of the function. And yeah, if I had type out StringAndError or IntAndError a million times to use a library, I'm sure I'd hate that library :)
1. In a world without templates/generics, you have to manually define a different such struct for every possible return type. Can be macro'd at the function definition, but it's ugly.
2. Without first-class support, it's very difficult to assert that the error is actually being checked. Consider:
int someVal = getSomeVal(...).v;
doSomethingWith(someVal);
Which is easier to catch at code-review time, or even at linter time?
In most languages that don't support multiple return values, both problems are solvable with templates or local equivalent - see, for instance, absl::StatusOr for C++ (https://abseil.io/docs/cpp/guides/status).
>Can be macro'd at the function definition, but it's ugly.
I wonder if typeof in c23 has changed this at all. Previously there was no sense in defining an anonymous struct as a function's return type. You could do it, but those structs would not be compatible with anything. With typeof maybe that's no longer the case.
e.g. with clang 16 and gcc 13 at least this compiles with no warning and g() returns 3. But I'm not sure if this is intended by the standard or just happens to work.
struct { int a; int b; } f() {
return (typeof(f())){1,2};
}
int g() {
typeof(f()) x = f();
return x.a + x.b;
}
edit: though I suppose this just pushes the problem onto callers, since every function that does this now has a distinct return type that can only be referenced using typeof(yourfn).
I think most of the time people really want a sum type, NOT a tuple. i.e. you can get back either a value OR an error, but not both. In Go, there's the pattern of returning nil, err, which is a hint that we really want a sum type there.
> In most languages that don't support multiple return values, both problems are solvable with templates or local equivalent
I don't really see the difference between tuples in C++ and multiple return values in Go (since you cite C++ as a language that doesn't have multiple return values).
val, err := do_something()
or
auto [val, err] = do_something();
What's the real difference there? Syntactic sugar?
I agree, you really want a sum, not a tuple - that's why my example was absl::StatusOr<T>, not std::pair. But a tuple plus tooling support (e.g., linters in Go that don't allow you to not use the returned error value) can emulate a sum type.
> 1. In a world without templates/generics, you have to manually define a different such struct for every possible return type. Can be macro'd at the function definition, but it's ugly.
one trivial possibility/option is to define in-out parameters, and just return errors:
> I never understood why "multiple return values" is a thing anyone should care about.
For the same reason people care about multiple parameters. There's no reason that the data coming out of a function should be in any way more constrained than the data going into the function.
Local variables are stack frame dependant. It makes perfect sense to make an array a pointer when using it out of the stack frame. It forces the programmer to actually think about what is going on underneath.
I'm a big C proponent, via C++, I've thought about this article on-and-off since it first came out. My team uses C++ to define new container types & some type-safe translation stuff. If I think about it, our containers are — at their root — fancy fat pointers.
Fixed but not enforced (which makes sense given compatibility with C). And that's often the problem, as each next "flaws are now fixed" iteration adds +1 method to do the same thing.
C's biggest mistake was originated in 1972 by people trying to create a language for writing operating systems, instead of being originated in 2000 by people trying to create a language that didn't allow you to make the mistakes you could make in C.
Yeah, yeah, yeah. C is horrible, and programmers are all idiots because they didn't see how wonderful the stuff you like is. Sure. To steal a line from Jane Austen (or at least a film adaptation), "You tell yourself that, if it gives you comfort."
The reality is that your approach didn't meet the needs of the real world as well as the C approach did. You may dislike it, you may resent it, but your way didn't work very well.
It's not only because Unix was free, and C came along for the ride. It's also that Multics took several times as long to write, and required bigger hardware. (So Unix would have been cheaper even if the OS was not free, both because of lower cost to write, and because of lower hardware cost for a system to run it on.) And less cost opened up far more possible uses, so it spread quickly and widely. (How many Multics installations were there, ever? Wikipedia says about 80.) Which is better, the language and OS that have flaws, or the language and OS that run on hardware you don't have and can't afford?
And Unix was far more portable. Didn't have a PDP-11 either? No worries; it was almost entirely written in C. You could port it to your machine with a C compiler and a little bit of assembly (a very little bit, compared to any other OS). Didn't have C either? It was a small language; it wasn't that hard to write a compiler compared to many other languages. If you were a university, you could implement C and port Unix on your own. But once on university had done so for a particular kind of computer, everyone else could usually use their implementation.
And the final nail in the coffin of your style: Programmers of that era preferred that languages get out of their way far more than that languages hold their hand. Your preferred kind of languages failed, because they didn't work with people. It may have worked with the people you wanted to have, but it didn't work with the people that actually existed as programmers at the time. A tool that doesn't fit the people using it is a bad tool.
But all is not lost for people like you. There are other languages that fit your preferences, and you can still use them. Just stop expecting them to become mainstream languages. The languages that the majority of programmers found better (for their use) are the languages that won. And that majority was not composed entirely of fools and sheep.
All the languages I care about to stay away from C are mainstream.
I have not had any reason to touch C since 2001, other than to keep up with WG14 work, exploit analysis, and security mitigations from SecDevOps point of view.
History lesson, there were other OSes other than Multics.
Also UNIX only became written in C on version V, and contrary to urban myths spread by UNIX Church, it wasn't the only OS written in a portable systems programming language.
It was the only one that came with free beer source tapes.
Now with governments looking into cybersecurity bills, let's see how great C is all about.
Were any others written in a portable language, that could also run on something as small as a PDP11? Were any of them as easy to port as Unix was? Were any of them written in a language as easy to implement as C was?
If not, then you didn't really answer what I said. You made an argument that used some of the same words, but didn't actually answer any of the substance.
The security complaints... yeah. A safer language would give you a smaller footprint of vulnerabilities, which would have made your life easier for last two decades. (Maybe you earned being bitter about C.) That's unrelated to the spread of C and Unix in the 1970s and 80s, though.
Ah yes, the PDP-11 example, as if hardware from 1958 - 1972 was more powerful.
Yes, they were as easy to port as UNIX was, for those that owned the code, Unisys still sells Burroughs as ClearPath MCP, nowadays running perfectly fine in modern hardware.
1980's he says,
"Many years later we asked our customers whether they wished
us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions
would have long been against the law."
C.A.R Hoare in 1980, during his Turing Award speech.
"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."
-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming
" We really are using a 1970s era operating system well past its sell-by date. We get a lot done, and we have fun, but let's face it, the fundamental design of Unix is older than many of the readers of Slashdot, while lots of different, great ideas about computing and networks have been developed in the last 30 years. Using Unix is the computing equivalent of listening only to music by David Cassidy. "
-- Rob Pike on his Slashdot interview
And to finish on security,
> The combination of BASED and REFER leaves the compiler to do the error prone pointer arithmetic while having the same innate efficiency as the clumsy equivalent in C. Add to this that PL/1 (like most contemporary languages) included bounds checking and the result is significantly superior to C.
> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.
Two points in response. Then I'm done. You can have the last word if you want it (and if you're still reading this, this long after the initial post.)
First, "such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for." This actually is what I was arguing - you could do things (like write or port an OS) with a much smaller group, which opened doors that were closed by more "advanced" tools/languages. The barriers to entry were lower, so lot more people were able to do a lot more things. There was massive value in that.
Second, it's not like the legislature made it illegal to work on all these other approaches. They were abandoned because the people working on them abandoned them. Nobody put a gun to their head.
This kind of goes back to the first point. C opened a lot of doors. Research stopped in some directions, because new areas opened up that people found more interesting.
In fact, this whole thing has a bit of a flavor of the elites mourning because the common people can now read and write, and are deciding what they're going to read and write, and it's not what the elites think they should. You (and the people you quote) are the elites that the people left behind. You think that yours is the right path; but the people disagree, and they don't care what you think.
(The point about hardware from 1958 being less than a PDP11 I will concede.)
> The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it.
C compiler development has forgotten this. C compilers must not be too clever in optimizing. They should generate tidy code that allocates registers well, and puts local variables into registers well, and peepholes away poor instruction sequences. But all the memory accesses written in the program should happen.
Maybe Steve and all the early C people were philosophically wrong, but that's the language they designed; it should have been respected. In C, how you write the code is supposed to matter, like in assembly language.
The problem with this approach, as anyone coding on 8 and 16 bit home computers is well aware, anyone with a medium skill level in Assembly programming could easily outperform code generated from classical C compilers, and compiler vendors had to win those compiler benchmarks on computer magazines, so optimize exploring every UB trick in the book they did.
Not by a long shot. The ability to switch between array notation and pointers is fantastic. People crying about it in 2023 is the same as crying about why a 1971 Dodge Challenger doesn't have ABS and airbags.
Zig in particular could scarcely do more to scream "I'm a replacement for C" without maybe asking WG14 to write "Just use Zig instead" in their next document.
Zig goes too far. It’s a radically different language with different syntax and very complex features such as comptime. People who want the issues of C fixed generally want to keep using C but without the issues.
I agree with OP but just for the sake of argument: You can use arrays as a member of a struct which also contains length, dimension and data specific details in other variables. If the problem is that this not being part of the language means everyone will do this their own way and will have to write a ton of extra code to support this, then yeah, that sucks but are most things C not like that anyways?
But in reality is of limited utility. This fixes the size of the array in the function signature. If I want to write a function that returns the sum of all the integers in an array I would need to write a new function for each size of array, a non-starter.
The annoying thing about this is that the array is just sitting there on the stack of the calling function. The size is completely known at compile time. But it gets thrown away as soon as you call a function.
> This fixes the size of the array in the function signature
That's what an array is. Fixed size. If we want to talk about a slice of some runtime determined amount of a thing then we need fat pointers, and C doesn't provide any fat pointer types so it's not a surprise to find it can't do slices.
No, I am not talking about a runtime-allocated structure. I’m talking about a function that can accept and operate on arrays of different sizes, all of which are fixed size at their declaration site, and all stored on the stack.
There is no reason this shouldn’t be possible. It should not require fat pointers at all because the size information is known at compile time.
AFAICT you’d either have to pass the length at runtime somehow, or treat the function as a generic/template and generate a separate version for each separate N it’s called with. The latter would be pretty far from the ethos of C, imo.
This could be done by storing the length of the array before the first element. It wouldn’t even need to be done in all cases, only if the length is actually used.
The title of the article is “C’s biggest mistake.” If you’re going to rebut the article, do so. Otherwise your reply just comes off as “this is the way it is, deal with it!” which is a pretty shallow dismissal.
There's C99 `arr[static size]` function argument syntax, but it's still not a proper slice type. Compilers may warn about incorrect direct uses of such argument (although last time I checked they don't), but it still decays to a pointer. You can't use this as a type of a variable or a struct field. There's no way to make subslices (safe pointer arithmetic) without losing the arrayness argument magic, etc.
But I don’t want a size parameter to be passed because I don’t necessarily (want to) have the size value at the call site. People on HN can cry all day about how bad C is and blah and blah. The truth is, C is the only language that gets shit done without getting in programmer’s way.
C is not a general purpose language, it was always intended to be used by people who want performance and control. If you want to pass a potentially large array by value to a C function, it's likely that you're using the wrong language.
package main
func foo(a []byte) {
println(len(a))
}
func main() {
var b []byte
// cannot use &b (value of type *[]byte) as []byte value in argument to foo
foo(&b)
}
Not only that, but some have solved it while maintaining compatibility with null terminated strings. Null terminated strings, after all, are sometimes more efficient.
It takes 4-8 bytes to represent the size of the string versus 1 byte for a null terminator. That doubles the size of the string when you embed it in a struct or pass it as an argument on the stack. In particular, remember that even today cache lines are only 64 bytes for x86-64 and while that seems like a lot, going from 64 bytes to 68 means you go from 1 cache miss to load some struct to 2 cache misses.
According to the bible (https://www.agner.org/optimize/) it's faster to use a loop with length than walking though a pointer so not having a length will make it slower to walk the string whole also making things like simd optimizations harder for the compiler to do.
That doesn't make sense. If you have loop with length you have to check both the content of the byte and the index; if you have null terminated strings you only check the content of the byte.
When you have the length, you can unroll the loop, so that you e.g. do 4 iterations at a time. With NUL you can’t do that. Moreover, loop iteration can be done in parallel (instruction-level parallelism) with processing the content of the string, since there is no data dependency between the two. With NUL you introduce a data dependency.
I read it. I don't see your point. Sentinel values simply give you a very low performance ceiling that you can't optimise past. What I wrote only took me a couple minutes. It's far from optimal (but it already takes 2/3 the time the sentinel version takes, and 1/2 the time Lemire's range version takes). With thorough effort I bet you could get at least 10 times faster than that, provided you are given the length. With sentinels, you're just left with the original performance levels you can't improve.
You can modify the length of NUL strings in place by inserting a new NUL or overwriting past the end without any other bookkeeping. You can split a string on delimiters simply by overwriting them with NULs.
I find these arguments rather weak. I don’t see how writing a NUL is any more efficient compared to updating a length. Furthermore, having the terminators in-band prevent the character data to be used for multiple substrings. E.g. in the split example the original string is no longer available.
Programming languages had array and slice types since the 60's. It wasn't like C developers didn't know their value. As a matter of fact, C's relaxed behavior with pointers and arrays when it came out was PRAISED to no end in the 80's like it provided developers great flexibility to do whatever they wanted, and the compiler would get out of the way. It was thought of as a feature at the time. It wasn't an omission but a deliberate choice to do it that way. People only relatively recently started to care about security and maintenance cost of these features.
And it was a great relief compared to my experience in, say, Pascal. You'll need a fairly advanced type system (Rust, Haskell, ...) to express the kind of stuff we wanted to do with static types and a very good compiler to make it efficient. We didn't have that until maybe mid-90es and trying to do it without that either led to very inefficient code or to breaking the type system.
So C was [mostly] great, but today we have better options.
Well.. a large portion of the installed base wasn't running an operating system with anything approaching a "permissions model" let alone "dynamic memory" and there were several competing systems and architectures vying for the top.
So, the desire and need for that flexibility was entirely warranted.