Linus on compiler warnings and code reviews

kybernetyk · on Sept 23, 2015

I tend to treat warnings as errors. (Even set up my IDE to do so).

In the short run it maybe is annoying but in the long run it really pays off. (Have been burned by a huge pile of crap code base with hundreds of warnings + 64bit transition).

halosghost · on Sept 23, 2015

This! I religiously use clang's -Weverything -Werror. Does it mean I spend more time dealing with false-positives? yes (though honestly, not that much). Does it also mean I end up writing much better code? yes, absolutely.

blazespin · on Sept 23, 2015

It doesn't take long before you naturally write warning free code. (Which ends up making you a better coder, as well. Such a simple thing..)

halosghost · on Sept 23, 2015

I completely agree. In general, the only warnings I get on my code anymore are ones related to how a library I use does something. In those cases, I surround those particular library calls with no-warn pragmas for that particular warning and then I go about my day. It feels so much cleaner.

nickpsecurity · on Sept 23, 2015

I didn't use a lot of C or C++ but I agree from my limited experience. I noticed the warnings and errors kicked in when I did certain things. They aggravated or worried me. So, I figured out how to not cause them. Some of those became a habit even during my brief stints working in those languages. Same thing happens in other languages with good compilers.

Won't stop lazy people from ignoring them but that says more about them than the compiler. ;)

MereInterest · on Sept 23, 2015

Agreed. New code shouldn't be committed unless it compiles without error. The build server should then compile with -Werror, with all relevant shame being healed on the developer who breaks the build by letting a warning through.

ryandrake · on Sept 23, 2015

Sadly, you folks seem to be the minority out there. I can't even remember how many times I've had conversations along the lines of:

Me: Hey, that code generates 3 compiler warnings, and you shouldn't be casting that pointer here.

Them: Oh, ok, whatever, change it if you want.

Me: It's not about what I want.. the code isn't correct. It'll break on some 64-bit platforms, for instance.

Them: Well, it works on our target platform, and that's the one in the spec. If the spec changes, we fix it as a change request.

Me: But it just happens to work! Even a different compiler could result in different behavior. Can't we just put in a little effort up front to avoid trouble later?

Them: Go for it if it bothers you so much. We have real bugs to fix. Nobody cares about hypothetical problems on systems we don't support and compilers we don't use.

Me: [This is going to drive me to alcoholism]

blazespin · on Sept 23, 2015

The biggest issue with lettings warnings go is that they pile up and you start missing the important ones that point to real bugs. I have seen that happen many many times.

MereInterest · on Sept 23, 2015

I think that I managed to convince a group member that we should have "-Wall -Wextra -pedantic" enabled all the time. He had disabled the warnings because they annoyed him. Yesterday, we spent a large quantity of time hunting down a bug that could have been avoided entirely had "-Wreturn-type" been enabled.

to3m · on Sept 23, 2015

And at the same time, while the development team is having fun with the blame game, somebody is waiting for a build that never comes!

At least do it the right way round: developer machines build with -Werror, and build PC should build without. (Selective "promotion" to errors is a good idea on top of that.) That way, it's not the end of the world if people check something in that doesn't build cleanly, because they disabled -Werror on their local copy in order to get some debugging done without having to play whack-a-mole as they did so.

(Obviously, they're not supposed to do that. But ideally they wouldn't be doing any debugging in the first place, either. These things happen, and processes should be designed around that.)

Not all warnings are equal, and -Werror doesn't give you any discretion. You probably want to be kept abreast of unused functions, unused variables, unused parameters, int-to-bool casts, truncation of double literals, and so on - but they're poor justifications for stopping the build. You don't need to hold the build to ransom to get it to compile without warnings. All you need to do is take the warnings out.

SamReidHughes · on Sept 23, 2015

I've never had big blocking problems like that with -Werror on all the time. You get one? Fix it and move on. When you have a large enough organization or enough different compiler versions at play, disable the specific warnings that seem to get you in trouble between compiler versions.

MereInterest · on Sept 23, 2015

Not all warnings are equal, but all warnings get printed to screen. If there are heaps and loads of ignoreable warnings, then I can't tell whether somewhere in that mess is a warning that I should be paying more attention to. Better to have a policy that no warnings should occur.

to3m · on Sept 24, 2015

I'm not saying they shouldn't be fixed. They should. They just don't need to be errors.

dllthomas · on Sept 24, 2015

The biggest place this breaks down in my experience (which is not to say it's not a win overall!) is when a compiler upgrade introduces a new warning. There's several ways it can play out, some of which aren't a bad thing overall, but most of which involve some pain.

Animats · on Sept 23, 2015

Biggest flaw in the design of C: no real array parameters. Source of most buffer overflow vulnerabilities for thirty years. It's a fixable problem [1] but it's not going to get fixed. We need to move forward to Rust, or something like it, and get rid of C for anything for which security matters.

[1] http://animats.com/papers/languages/safearraysforc43.pdf

kybernetyk · on Sept 23, 2015

>We need to move forward to Rust

That's the second time in two days I've come to read offensive ranting about a language + "let's move to Rust" as the proposed silver bullet solution here on HN.

If this is the way the Rust community tries to evangelize their language I'm inclined to keep myself as far away as possible from it.

Now don't get me wrong: I think Rust is fine. But that "C/C++ is dumb and everyone who continues to use it is also dumb and/or ignorant" sentiment won't win you any friends. There are valid reasons why people keep using C++ and can't switch to any other language in the near future.

klodolph · on Sept 23, 2015

I don't see how this is offensive. It is well established that most programmers will produce defects at a relatively high rate when writing C or C++. Rust "or something like it" does effectively eliminate certain categories of defects. Java evangelists have been claiming the same thing since the late 90s, and they were right... buffer overflows in Java programs have most commonly occurred in parts of the Java library written in C, and Java has given us productivity + quality gains in various fields (mostly server-side web applications).

Nobody's saying that C/C++ or its users are dumb and/or ignorant. The cost of switching languages is high, and Rust is not very mature.

pmelendez · on Sept 24, 2015

>It is well established that most programmers will produce defects at a relatively high rate when writing C or C++

It's anecdotical, but most of actual C++ programmers I know won't produce more defects than Java programmers for instance. In the last 5 years I have seen buffers overflows just once and was in a legacy code (and C++ has been my main language in that period). Java has its own sets of defects (such as resources leaking).

In my opinion are good practices like code reviews the ones that help to produce good code and not the choice of the language.

ArkyBeagle · on Sept 23, 2015

Using a language to solve a practices problem is the same with any other attempt to use technology to correct behavior. It's not completely unjustified, but don't expect good results.

If you have to write 'C', learn 'C'. I'd say the same of C++, but by the time you learn it, it's become something different :)

klodolph · on Sept 23, 2015

Array bounds errors are a common source of problems in C. Fixing these errors with behavioral changes is expensive. It increases the amount of training and increases the cognitive load on programmers, and yet errors will still slip by because we are imperfect. On the other hand, if I add bounds checking to my programming language, then I've just transformed serious vulnerabilities into simple runtime errors. Computers are easier to reprogram than brains are.

ArkyBeagle · on Sept 24, 2015

Within ... months(2? 3?) of starting learning 'C'. I no longer produced code that was capable of array bounds violations. Of course you slip up now and again, but it's just a matter of habits.

This was in 1987 or so. Perhaps that was unusual; I don't know. It was part of the culture where I worked. I'd programmed professionally in Pascal before, so I'm familiar wit the compare/contrast here. Tactically, it was safer, because 'C' covered more than did Pascal, and the balance of the code when you used pascal was assembly. Perhaps the assembly was sufficiently dangerous that we didn't make as many mistakes, so who knows? We felt assembler was more of a liability.

The choices were more limited then. Performance, shmerformance - darn near everything runs fast enough, now, with the exception of a few grotty corners.

Rust/Python/what have you are all very nice. I just sympathize with the poor people who get drafted into working on 'C' code bases without seeming to be happy about it. I personally would find somebody to mentor me a bit and review the code if I were in those shoes. Perhaps that sort of thing has passed; shame if it has.

sangnoir · on Sept 23, 2015

How about "taking heed of compiler warnings" as suggested by Linus? it's niether expensive nor mentally taxing

nickpsecurity · on Sept 23, 2015

How about eliminating the need for them in the common case like other systems languages did? Why should you have to worry every time you do a common operation if the alternative has acceptable performance and is actually more productive?

And I'll add that most of the older, safe languages let you turn off safety for performance if you have to. You just then have to use you brain like you would with C or whatever. Just for that module, though, along with calls into it.

ArkyBeagle · on Sept 23, 2015

Which costs more, the errors or the behavioral changes?

pcwalton · on Sept 23, 2015

> Using a language to solve a practices problem is the same with any other attempt to use technology to correct behavior. It's not completely unjustified, but don't expect good results.

OK, but the fact remains that undefined behavior due to bounds checks is orders of magnitude less frequent in, say, Ruby than in C. I have a hard time seeing this as anything other than a good result.

ArkyBeagle · on Sept 24, 2015

I can't argue with that at all. I'm just saying if you're stuck using 'C', it's worth learning the strategies and habits needed to prevent unsafe code.

There are issues other than safety that influence these decisions. Rust & Lua are also closer to 'C' than is Ruby, so ....

kibwen · on Sept 23, 2015

It's a mental leap to equate random commentators on HN with any community in particular. :P

You're also mischaracterizing Animats' comments as "offensive ranting". That's John Nagle (this guy: https://en.wikipedia.org/wiki/Nagle's_algorithm) and he knows his way around C.

steveklabnik · on Sept 23, 2015

"C and C++ is dumb and everyone who continues to use it is also dumb and/or ignorant" certainly isn't the position of the Rust team nor the general Rust community at large.

klodolph · on Sept 23, 2015

It's not even the position expressed in the parent comment.

steveklabnik · on Sept 23, 2015

It is how my original parent perceived it, and perceptions matter.

I wouldn't work on Rust if I didn't think it was important and a step forward, but even technologies with flaws (read: all of them, Rust has flaws too) are worth using, depending on your situation. Trade-offs are the heart of engineering.

klodolph · on Sept 23, 2015

Perceptions matter, but language choice is a very personal issue and I've noticed that people can get very defensive when they perceive attacks on their language of choice (such as C++) or support for a language they hate (such as PHP). We should continue to talk about language choice because it's important, and we should avoid getting into pissing matches when we do it.

I'm convinced that in order to make this work, we have to be more generous in how we perceive other people's comments.

nickpsecurity · on Sept 23, 2015

I agree. I have an example. I'm an opponent of both C and C++ who goes so far as to call them garbage. In some ways, I throw them in the same camp. However, C++ supporters consistently argued that it was a vast improvement over C even in safety and most C-like problems in came from "using C++ like C." It took a while but a few eventually supported that position with specific evidence (example below) that made me adopt their position. Not to mention same empirical studies I cite showed C++ apps having less defects than C apps.

https://news.ycombinator.com/item?id=10208786

So, there's an example where I had to step back, assume other person has intelligent reasons, and let the person show them with good results coming out of it. I still have a use of "C/C++" in these discussions but now do it less often due to superior safety over C of modern C++ development. New standards mean I'll probably be hit with a similar conversation again lol.

devit · on Sept 23, 2015

It's because Rust is the first production language that is safe and strictly better than C (and that is likely to be strictly better than C++ as well once it gains some missing features).

For decades, it has been known that C and C++ are bad languages due to their unsafety, and the use of C/C++ is probably the top cause of bugs and security holes in software; however, switching required giving up performance or features.

Now that finally a solution to this problem has become available after so much time, it's natural that people are enthusiastic about it.

nickpsecurity · on Sept 23, 2015

That was Ada and a bunch of Wirth languages which both got regular updates going back decades. Most developers ignored all of them. Rust is a new one going mainstream. As usual, mainstream acts like everything else ever happened. An exception is other one, Go, which intentionally copies Wirth's languages in design. Usually stuff is ignored, though.

kibwen · on Sept 23, 2015

  > As usual, mainstream acts like everything else ever 
  > happened.

Not Rust. The reference has a long list of languages which have been influential (https://doc.rust-lang.org/reference.html#appendix:-influence...) and the official book has a list of research papers which have helped to guide the design of the language (http://doc.rust-lang.org/book/academic-research.html).

nickpsecurity · on Sept 23, 2015

I meant the commenter as the example of mainstream audience that acts like other things didn't do it first. Wasn't sure what Rust's official position was. Thanks for link as I'll keep it in mind for future discussions. :)

Note: Strange that they draw no inspiration from Ada despite it countering many issues by design. I figure at least its tactics if not syntax would be a nice start on a new language. Least they borrowed from and built on a lot of good ones, though.

kibwen · on Sept 23, 2015

Ada actually takes a different approach to memory safety than Rust. Ada requires garbage collection if you want to dynamically allocate, and only statically ensures safety without a GC if you opt into the subset where only static allocation is allowed. Ada also has at least one correctness-aiding feature that Rust lacks, which is the ability to define integral types with an explicitly bounded range.

nickpsecurity · on Sept 23, 2015

The consistent point, as I mentioned, is that Rust isn't the first, safer, systems language. There's quite a few before it. It does take a different approach and I like Rust's innovations in this space. I'm keeping a distance for now to let it and the associated coding styles evolve. Just reading the experience reports, articles, HN comments, etc for now.

Also watching Julia as it has a host of good features with potential to take on Python and R at same time. Especially macros and painless C calls.

Note: Btw, good call on integral types. That's quite beneficial. Also, existential types would prevent all sorts of issues that happen when two things are same size but should be treated differently. Like a mi-to-km mismatched that was a downer for one program in particular. ;)

kibwen · on Sept 24, 2015

I agree, Rust isn't the first language to exist in the "safe systems language space". But it's a recurring meme that Rust is somehow related to Ada, when in reality they are divergent lineages. :)

  > Also, existential types would prevent all 
  > sorts of issues that happen when two things 
  > are same size but should be treated differently.

I don't know what existential types are, but we use phantom types for that: https://blog.mozilla.org/research/2014/06/23/static-checking...

nickpsecurity · on Sept 24, 2015

Oh, I didn't know about that meme. Explains your reaction to my post. Oh no, mine is more meme that people ignore what exists and miss potential benefits (or just accuracy) for various reasons. So, a consideration of safe, systems programming should include most mature tools (eg Ada) plus any newcomers reaching maturity (eg Rust). I wasn't implying Rust had any connection to Ada: more surprised Ada had no influence on it than anything. I'll be sure to look out for that other meme, too, to call them on their BS. ;)

"I don't know what existential types are"

Unfortunately, I couldn't find a non-dense explanation of it that had relevant detail. Good news is the Ada book I've been referencing has a great explanation with examples. See Ch 2 "Safe Typing" for the answer in the first, two pages. They call them "Distinct Types" in there (shrugs). Included link to whole book in case you see safety tactics worth copying.

http://www.adacore.com/knowledge/technical-papers/safe-secur...

So, they make sure the types and implementation are separate so unique types ("existential types") can catch interface issues that can be subtle and nasty. Casts work around them where necessary while making the risk more obvious. Aim to try to keep overhead down compared to heavy OOP + regular types. Seeing it made me wonder, "Why don't the rest have this!?"

"but we use phantom types for that"

Thanks for the link! Adding it to my list of things to read and review. :)

devit · on Sept 24, 2015

Ada doesn't support safe GC-less memory allocation, which means it's either unsafe, far worse feature-wise than C or requires a GC which also makes it not strictly better than C.

As a consequence, it seems that if you use Ada's GC, you might as well use Java instead, and if you don't you might as well use C++ instead, which along with its unattractively verbose and not C-like syntax is probably why Ada is not very popular despite having been around for 30 years.

nickpsecurity · on Sept 24, 2015

Ada does most allocation, including local dynamic, on the stack. From there, for the heap, you get to choose whether to manage individual pieces of memory, work from whole pools, or do GC. With these, you get help from the type system in catching or preventing errors. Past that, there's the usual risks. Requires a lot less memory management and runtime overhead than Java while preventing way more problems than C.

And, as I often say, the empirical studies all showed C doing far worse in terms of defects introduced. Ada benefiting safety & productivity over C is a proven fact rather than an opinion. So, questions are: (a) should we use it for this project? (b) should we build on a proven solution and make it better?, and especially (c) should we borrow anything in this that made it effective when designing a new language? I especially push for (c) to be applied any time new tech is built. Ada is my reference for safe, systems programming because it did what it promised and is the oldest one still updated/used. Also turned out almost as future-proof as COBOL with a lot safer, easier maintenance. Such longevity, along with safety, benefits enterprises writing mission-critical software.

Details on various safety mechanisms:

http://www.adacore.com/knowledge/technical-papers/safe-secur...

Far as replacements, Rust is looking promising. Needs to mature a bit and better tools for various situations. A great design, though, with a lot of potential. Not sure how it would fare for embedded, though: Ada was used in microcontrollers and stuff.

steveklabnik · on Sept 24, 2015

Rust is actively used on embedded projects. The sticking point is mostly if the architecture is one that LLVM supports, which means that are some platforms we can't support.

nickpsecurity · on Sept 24, 2015

Appreciate the clarification on that.

steveklabnik · on Sept 24, 2015

Any time. It's hard to remember all the details of tons of different languages...

nickpsecurity · on Sept 23, 2015

"But that "C/C++ is dumb and everyone who continues to use it is also dumb and/or ignorant" sentiment won't win you any friends."

Friends are a social, not technical, matter. Far as C, using it outside of necessity (eg legacy or critical libraries) is dumb for projects aiming for robustness. It's not even opinion so much as empirically-backed fact: every study the military and researchers did on C vs other programmers back in the day showed the C people had highest defect rate and among lowest productivity [often due to debugging]. Many were an investigation into Ada vs other languages w/ Ada cutting defects in half not being uncommon. Later Java studies showed same problems with C and benefits by ditching it albeit with much higher overhead. ;) The CVE's continue to reflect the same with Java and Ada being interesting as many of their apps flaws occurred when they leveraged a C library.

Like Ada's style or not, it kept defects and overhead low by systematically [1] looking at where problems happened and implementing solutions to them at language/compiler level. Modula-2 did the same thing to a lesser degree. I believe PL/S, a systems variant of PL/1, also had design choices that boosted safety partly inherited from PL/1. Burrough's ALGOL (esp NEWP) did quite a bit in language to reduce risks. These were used in reliable software ranging from mainframe OS's to desktops to embedded. They also predate the C++ language which also attempted to solve C's issues albeit with great C compatibility.

So, there are numerous languages whose coders can build robust software at a similar or better pace to C coders building buggy software. These languages, and their descendants, have been around for a while with tool support. The apps built with them are more reliable & had fewer coding vulnerabilities. Clearly, C is just badly designed in terms of robustness, productivity, and maintenance. No secret why: its lineage dumped most of the features of better languages to run on hardware [2] that couldn't support them with acceptable efficiency. Over time, we ditched that hardware for a series of others better suited to our needs. It runs so fast that the CPU is often idle 70-90+% of the time our apps are running. We should likewise ditch C and raw efficiency for something better, safer, and still efficient. Rust is a promising option among others.

Note: One trick in the past where the GUI, a legacy library, etc absolutely needed C or C++ was to split the app. Ada, particularly, made cross-language design easier to support this use case. So, the critical part of the logic might be written in a safer langauge, less critical in C or C++, and a careful interface glue them together. This was done for Mondex Certificate Authority, IIRC. I did it myself in numerous designs for fault-isolation or security reasons.

[1] http://www.adacore.com/knowledge/technical-papers/safe-secur...

[2] https://en.wikipedia.org/wiki/PDP-11

yorkedork · on Sept 23, 2015

This response is neither productive nor insightful.

First, it's probably not useful to arbitrarily claim what the biggest flaw in the design of C is without context or data.

Second, I don't think it's reasonable in this case to place the blame on the design of a language (certainly, the choice of C has long been a pragmatic decision). As Linus alludes to directly, the tools exist with which to write more correct, idiomatic C code for the kernel (e.g., ARRAY_SIZE and a useful gcc warning).

Finally, as the historical record should demonstrate, 'getting rid of X by replacing it with Y' is not particularly actionable; it certainly isn't the most efficient solution in terms of resources and likely not in terms of correctness over the short- to medium-term.

If I'm to hazard a guess, the point of his message was to remind everyone the biggest source of flawed code lies in our own hands and in the biases individuals and groups bring to large-scale development.

The choice, here, isn't between using a good language and a bad language; it's between using a language and its idioms correctly, and using those things lazily with foul consequences.

LBarret · on Sept 23, 2015

the choice is also between a good language and a bad one. Those two choices are not exclusive, and choosing good practices is not the end of the road for the quest for quality.

acconsta · on Sept 23, 2015

No. There's only one language you can write a patch for the Linux network stack in — C.

"Move to Rust!" is not a productive or helpful suggestion, unless you have an massive army of programmers ready to port Linux to Rust.

kibwen · on Sept 23, 2015

You can write a patch for the Linux network stack in any language you want. Whether Linus will merge it is a different matter. :P

EDIT: Downvotes eh? Then I suppose I'll state outright what I left implicit: Linux is not inextricably tied to any single language, and it's a strawman to suppose that any attempt to integrate Rust with Linux would first have to reinvent the universe from scratch. Rust has excellent interoperability with C (though, notably, unions are iffy), and it's common to write Rust code that gets called from C with C being none the wiser.

acconsta · on Sept 23, 2015

For the record, I don't have enough karma to downvote.

>Linux is not inextricably tied to any single language

It is pretty tightly coupled to GCC, which, as you know, is a C compiler. Some people are working on supporting Clang, but last I heard they haven't succeeded.

It would certainly be interesting to see someone link Linux to Rust code though.

zurn · on Sept 24, 2015

Linux has been built with small modifications under tcc, icc and llvm. Linus doesn't want it to be tightly coupled to GCC.

The LLVM build patchset is being actively maintained, here hare slides from February: https://events.linuxfoundation.org/sites/events/files/slides...

acconsta · on Sept 24, 2015

Yeah, I get the impression Linus fucking hates GCC :P

That's good though. Compiler freedom is better for everyone.

kibwen · on Sept 24, 2015

The C ABI is de facto standardized, the compiler in use doesn't matter. Rust code can expose a C ABI and be compiled into a linkable artifact, and that means that C code can use it as though it were an artifact compiled by any C compiler.

acconsta · on Sept 24, 2015

It's not quite plug-and-play, but it does seem like there's been some progress.

https://lwn.net/Articles/644681/

kachnuv_ocasek · on Sept 23, 2015

It's not a flaw IMO, it's just confusing. Especially for those who come from higher languages and don't read a solid intro into C. It's actually explained quite well in K&R.

Animats · on Sept 23, 2015

No, it's a design flaw. It's a consequence of K&R C for the PDP-11, which had to be crammed into a small address space machine and didn't even have function prototypes. In K&R C, which is one notch above assembly code, there was no cross-module type checking and everything was really an int or a float.

Even in the 1970s, this was kind of lame. Pascal could pass array sizes through a call.

pjmlp · on Sept 23, 2015

It is a big flaw to ignore what other language communities were doing.

Other systems programming languages older than C did it properly.

tosseraccount · on Sept 23, 2015

How many programs running on your computer were written in these languages?

pjmlp · on Sept 23, 2015

What does it matter?!? UNIX killed those languages.

AnimalMuppet · on Sept 23, 2015

And why? Because in practice, C on Unix was better for writing actual programs that actually worked to do things that people cared about. And that's why it matters: Because C was actually better in practice. Everybody seems to be in denial about that; maybe they should learn to deal with it.

Was that "actually better" due to C, or to Unix? Well, maybe some of both, but Unix was written in C, so C was useful for writing the operating system that you blame for C taking over the world...

If you don't understand why C was better in practice, you're going to fail in attempts to create something better, and never know why. You're just going to be ignored, while you whine about how your way was better.

pjmlp · on Sept 23, 2015

Completely wrong!

Back in those days we used to pay for our compilers, usually in values of thousands, remember those?

Only languages delivered as part of was then the OS vendor's tooling got used at most work locations.

It always required lots of persuasion to buy compilers for languages not delivered as a standard part of the OS.

So as UNIX managed to gain a foothold into the enterprise and universities, pushing mainframes and other OSes, C gained mind-share.

UNIX took over the world by accident, by having AT&T initially providing the code for free (which they repented later on) to universities, which happened to have persons like Bill Joy and Scott McNealy that got successful with their startups using the OS they enjoyed at their universities.

Had AT&T never given the code for free or those startups floundered, UNIX will be another footnote alongside MULTICS and friends.

Just like nowadays no one sane would use JavaScript, if the browsers had first class support for other programming languages.

AnimalMuppet · on Sept 23, 2015

There is an element of truth in what you say. You seem to think it's the whole story; I do not. Had C not been good in practice, "free" wouldn't have been enough.

Those OSes written in other system programming languages: Why didn't their computers take over the world? I mean, sure, AT&T was pumping money into Unix, but other companies were pumping money into the competition. Why did Unix win? It's not just because of evil AT&T. It's because Unix could deliver working features, and others struggled to keep up.

Why could Unix deliver working features? Yes, partly because of AT&T's money. But partly because C turned out to be really useful as a systems programming language.

These other languages you mentioned a few posts ago: Sure, they looked good on paper, only where's the beef? When it came time to deliver, what was written in them? More functionality was written in C, which is why Unix won.

I'll repeat my previous statement: If you don't understand why C was better in practice, you're going to fail in attempts to create something better, and never know why. You're just going to be ignored, while you whine about how your way was better.

History ignored your "better" languages. The world moved on from them, for good reasons. You think they were better, but in real life, they weren't.

pcwalton · on Sept 23, 2015

What can you do in C that you can't do in, say, Algol 68? Or even Pascal (once you get to a recent enough version supporting dynamic arrays/first-class pointers)?

I think the situation is not far off from what pjmlp is saying: there were several more or less equally good languages available at the time, and C won by virtue of being in the right place at the right time. There are lots of really nice things about C from a systems point of view, but most languages of the 70s had those too. C was a nice language for the late 70s (not so much today, IMO), but not hugely nicer than its contemporaries.

AnimalMuppet · on Sept 23, 2015

You can do it in those languages (at least you can in a modern Pascal; I don't know enough about Algol to say).

But remember, the original claim by pjmlp was "It is a big flaw to ignore what other language communities were doing. Other systems programming languages older than C did it properly." That doesn't apply to the Pascal you speak of, because at the time C arose, Pascal wasn't "once you get to a recent enough version supporting dynamic arrays/first-class pointers". It was Pascal with the size of an array being part of the type of an array. That's type safety, sure. But it also means (to use an example that I have personal experience with), if you're writing a numerical simulation on a 2D array, and you want to let the user specify how big the mesh is, and then allocate memory to hold the mesh, you have a problem. You can't define a variable-sized array at all in the Pascal of the time.

Now try to think about how you'd write a memory allocator in that. Good luck.

Another example: We were on an embedded system. To write to hardware registers, we had to call an assembly-language subroutine. In C, we would have simply used a pointer to an absolute memory address.

This is what C gave you - you could just do things without the language getting in the way. Yes, you could cut yourself on C's sharp edges, whereas Pascal protected you. But when you needed the sharp edges to cut something, Pascal didn't have them, and you were stuck.

Again, I don't know enough about Algol to meaningfully compare it to C. The Pascal of the day wasn't the answer, though.

pjmlp · on Sept 24, 2015

You are assuming hardware registers were available via memory mapped address. Many machines used only IO ports. Where are the C features for that?

Pascal was originally used for teaching it doesn't count. The first time it was used for writing an OS, was with the Object Pascal dialect for the Mac OS.

What counts were Algol 60, Algol 68, Algol W, PL/I, PL/M, Mesa among many others.

As for the statement of only C being usable without Assembly, check the B5000 from the Burroughs Corporation, developed in 1961.

EDIT: where => were

AnimalMuppet · on Sept 24, 2015

> You are assuming hardware registers were available via memory mapped address.

I'm not assuming it. The environment I was in had memory-mapped IO.

> Many machines used only IO ports. Where are the C features for that?

C compilers for machines that only had IO ports typically had a library function to do it. At least, by the era of Turbo C, they did. (The x86 architecture is the first one I was familiar with that did IO that way, and Turbo C was the first C compiler I had on it. I presume that, if earlier architectures did IO that way, C compilers for those architectures had similar capabilities, but I do not know that first-hand.

> Pascal was originally used for teaching it doesn't count.

I was replying to pcwalton, who asked what you can do in C that you can't do in Pascal. So it may not count to you (four comments upthread from here), but it counted to the comment I was replying to (two comments upthread from here). Perhaps your comments would be better addressed to him/her.

> As for the statement of only C being usable without Assembly, check the B5000 from the Burroughs Corporation, developed in 1961.

I didn't say that only C was usable without assembly. I said that using Pascal, in order to write to memory-mapped IO registers, you had to call a function written in assembly, and that in C you didn't have to do that. I never said only C was usable that way. Please stop putting words in my mouth that I didn't say.

pjmlp · on Sept 24, 2015

> I said that using Pascal, in order to write to memory-mapped IO registers, you had to call a function written in assembly, and that in C you didn't have to do that. I never said only C was usable that way. Please stop putting words in my mouth that I didn't say.

In what language do you think the Turbo C library functions for port IO and all those handy BIOS and MS-DOS calls were written on?

Assembly, of course.

As for doing memory mapped IO with Pascal, you could something like this in Turbo Pascal. Other dialects had similar extensions.

    var
       videoMem : array [0..255] of byte absolute $A0000;

    begin
      videoMem [0] := 12;
    end;

AnimalMuppet · on Sept 24, 2015

> In what language do you think the Turbo C library functions for port IO and all those handy BIOS and MS-DOS calls were written on? Assembly, of course.

Of course. But they still looked like just another function call in C, and they came with the compiler, so the programmer writing in C didn't care.

In my Pascal example, there was no such function, so we had to write it, so we had to care.

You seem to be consistently trying to make my words say things that I am not saying, and then arguing against positions that are only in your own mind. It's getting quite tedious.

nickpsecurity · on Sept 24, 2015

Pascal does count because people wrote real software in it, it was better than C, Pascal/P was one of the best portability hacks ever, UCSD P-system used it, and Modula-2 built a real OS with a lot of same features. That said, if people counter Pascal, we can remind them of its purpose as you did if the limitation was due to that purpose. Modula-2 is my goto reference in these discussions because it's closest to C's niche: resource-constrained, system programming w/out garbage collection. And better than C w/ successors that were all better than C. :)

"As for the statement of only C being usable without Assembly, check the B5000 from the Burroughs Corporation, developed in 1961."

Good call. First great, overall system as I see it. Might also remind them that Wirth's and Jurg's Lilith workstation ran on Modula-2 and assembly. Most comparable to UNIX in terms of hardware and personnel constraints. Shows even constrained teams could do better than C or UNIX. The Oberon systems used Oberon and assembly, as well, with a GC'd OS and software. That helps in another recurring debate about OS's in "managed code." ;)

So, yes, that C is necessary for systems programming is a long-running myth refuted by examples which were better because they didn't use it.

nickpsecurity · on Sept 24, 2015

He's close to the whole truth. All of you talking about this would know if you merely looked up the history of C language and UNIX. C's lineage is as follows: ALGOL60->CPL->BCPL->B->C. Compare ALGOL60, PL/I, Pascal, or Modula-2 to C to see just how little C did for people. Why did they take all the good features features out and introduce the dangerous alternatives? They needed something whose apps and compilers would work on their PDP-11 with easy reimplementation. That's it.

Note: Nicklaus Wirth's solution to same problem was much better: P-code. He made idealized assembler that anyone could port to any machine. His compiler and standard library targeted it. Kept all design advantages of Pascal with even more implementation simplicity than C. Got ported to something like 70 architectures/machines.

Now, for OS's. Let's start with Burroughs MCP. The Burroughs OS was written in a high-level language (ALGOL variant), supported interface checks for all function calls, bounds-checked arrays, protected the stack, had code vs data checking, used virtual memory, and so on. That's awesome and might have given hackers a fight!

Later on, MULTICS tried to make a computer as reliable as a utility with a microkernel, implementation in PL/0 to reduce language-related defects, a reverse stack to prevent overflows, no support for null-terminated strings (C's favorite), and more. It was indeed very reliable, easy to use, and seemed easy to maintain. You'd have to ask a Multician to be sure.

So, the OS's were comprehensible, used languages that made reliability/security easier, had interface/array/stack protections of various sorts, consistent design, and all kinds of features. Problem? Mainframes were expensive. The minicomputers Thompson and Ritchie had were affordable but their proprietary OS's were along lines of DOS. You can't do great language or OS architecture on a PDP-11 because it's barely a computer. It would still be useful, they thought, if it had just enough of a real language and OS to do useful work.

So, they designed a language and OS where simplicity dominated everything. They took out almost all the features that improved safety, security, and maintenance along with using a monolithic style for kernel. Even the inefficient way UNIX apps share data was influenced by hardware constraints. The hardware limitations are also why users had to look for executables in /bin or /sbin for decades: original machine ran out of space on one HD & so they mounted another for rest of executables. All that crap is still there because fixing it might break apps & require fixing them. Curious, did you think they were clever design decisions rather than "we can't do something better without running out of memory or buying a real computer so let's just (insert long-term, design, problem here)?"

The overall philosophy is described in Gabriel's Worse is Better essay:

https://www.dreamsongs.com/RiseOfWorseIsBetter.html

As Gabriel noted, UNIX's simplicity, source availability, and ability to run on cheap hardware made it spread like a virus. At some point, network effects took off where there's so many people and software using it that sheer momentum just adds to that. Proprietary UNIX's, GNU, and Linus added more momentum. After much turd polishing, it's gotten pretty usable and reliable in practice while getting into all sorts of things. One look underneath it shows what it really is, though, with not much hope of it getting better in any fundamental way:

https://queue.acm.org/detail.cfm?id=2349257

So, aside from not knowing history, there seems like there's not even a reason to debate the reason behind bad design in C and UNIX at this point aside from merits of overall UNIX architecture vs others. The weaknesses of C and UNIX were deliberately inserted into those by the authors to work around hardware limitations of their PDP-11. As those limitations disappeared, these weaknesses stayed in the system because FOSS typically won't fix apps to eliminate OS crud any quicker than IBM or Microsoft will. Countless productivity, money, and peace of mind were lost over the decades to these bad design decisions in the form of crashes or hacks.

Using a UNIX is fine if you've determined it's the best in cost-benefit analysis but let's be honest where the costs are and why they're there. For the Why, it started on a hunk of garbage. That's it. Over time, when it could be fixed, developers were just too lazy to fix it plus the apps depending on such bad decisions. They still are. So, band-aids everywhere it is! :)

emodendroket · on Sept 23, 2015

Well, by this metric, (insert Java, bat files, JavaScript, whatever thing you hate here) is a fantastic language too.

pcwalton · on Sept 23, 2015

> It's not a flaw IMO, it's just confusing.

That is a flaw.

klodolph · on Sept 23, 2015

There are a ton of "design flaws" in C, including integer promotions, string processing which is hopelessly error-prone (and people who think that strncpy() fixes anything), a preprocessor which is basically its own language, the mere existence of gets(), et cetera. There are many programming patterns in C which produce buffer overflows, a lack of array parameters is only one source.

acconsta · on Sept 23, 2015

>We need to move forward to Rust

In fairness, C++ containers know their size. And Rust's solution to buffer overflows is the same as every other language — run time bounds checking.

nostrademons · on Sept 23, 2015

Most of the time, you'll be accessing Rust collections through iterators, and iterators fold the bounds-check into the termination condition. There's no additional overhead here; it compiles into the exact same code that the C would.

acconsta · on Sept 23, 2015

If you're using iterators, you don't have to worry about buffer overflow. But every A[i] random access needs to be bound checked.

FreeFull · on Sept 23, 2015

Although the bounds check can also be elided if the compiler knows i < len

acconsta · on Sept 24, 2015

Right. I think that's mostly immutable containers though, right? Or is the compiler smarter than that?

wspeirs · on Sept 23, 2015

> And Rust's solution to buffer overflows is the same as every other language — run time bounds checking.

Right, because that's the only way to do it? What's wrong with that?

The point is a language like Rust has built-in bounds checking...

Animats · on Sept 23, 2015

The trend is towards optimizing out bounds checks for at least the easy cases, such as FOR loops. Go does this. Rust should, and probably will soon. That tends to get most of the inner loops where it really matters, like a matrix multiply. C++ can't do that because the compiler doesn't know that a template-implemented bounds check is a bounds check.

Also, in C++ containers, ".at()" is usually checked, but "[]" is not. So C++ code still regularly has buffer overflow problems.

pcwalton · on Sept 23, 2015

> Go does this. Rust should, and probably will soon.

LLVM does optimize these out, and in more cases than Go does.

> C++ can't do that because the compiler doesn't know that a template-implemented bounds check is a bounds check.

It doesn't need to. SCCP is a very general optimization. GCC and LLVM optimize out bounds checks in more cases than Go can do.

logophobia · on Sept 23, 2015

Rust does optimize iterators, just not random access. You can even turn off the bound checks with get_unchecked, if you really really need unchecked random access.

steveklabnik · on Sept 23, 2015

(and for loops use iterators under the hood)

chriswarbo · on Sept 23, 2015

> Right, because that's the only way to do it? What's wrong with that?

Maybe the fact that it's wrong? Compile-time bounds checking is such a classic "hello world" example for type checkers that it's become a cliche (eg. https://www.quora.com/What-are-some-good-examples-of-practic... )

Dependent types make zero-overhead bound checking trivial. More recently, less powerful type systems like liquid types have been proposed (because reasons). Zero-overhead bound checking is usually on their feature list, since a) it's ubiquitous b) it's easy.

acconsta · on Sept 23, 2015

Great! Please implement them in Rust as soon as possible.

But until then, Rust's approach is runtime bounds checking.

chriswarbo · on Sept 24, 2015

I don't know about Rust, I've never used it (and I rarely use C). However, I do use dependently typed languages every day, so I was replying to refute the "that's the only way to do it" argument.

acconsta · on Sept 24, 2015

For... building software? I thought dependent types haven't escaped research languages.

okasaki · on Sept 23, 2015

So does C++: http://www.cplusplus.com/reference/stdexcept/out_of_range/

acconsta · on Sept 23, 2015

>The point is a language like Rust has built-in bounds checking...

As does C++. See e.g. vector.at and libstdc++ debug mode:

https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode.h...

The difference is only opt-in vs. opt-out. To its credit, Rust provides additional protections against iterator invalidation and dangling pointers, but its approach to buffer overflows is the same.

icholy · on Sept 23, 2015

Can't you turn them off though?

kibwen · on Sept 23, 2015

You can opt out of bounds checking by using the unsafe `get_unchecked` method instead of the array indexing syntax. There's no compiler flag to turn off bounds checking, because the language doesn't want to encourage memory safety that's dependent only on compiler flags (and also because the cost of runtime bounds checking in real programs is indistinguishable from noise).

acconsta · on Sept 23, 2015

>the cost of runtime bounds checking in real programs is indistinguishable from noise

Can you show us the benchmarks for that?

kibwen · on Sept 23, 2015

The source of my assertion comes from repeated conversations with the Servo developers, who have written many hundreds of thousands of lines of Rust in a project whose goal is to be twice as fast as modern web browsers (in other words, they are comparing themselves to programs written in C++, and speed matters). They regularly profile in pursuit of tracking down inefficiencies, and bounds checking has never been even a blip on their radar. Quotes from the exchange I've had just now with pcwalton: "I haven't done rigorous benchmarking but I have never seen it in instruction level profiles [...] I'd rather just spend my time shipping software that uses fast bounds checks and prove it that way :) [...] my point is simply that the delta between idiomatic Rust code that uses iterators and no bounds checks and the idiomatic Rust code that uses iterators and bounds checks is incredibly small since bounds checks are so rare to begin with" (http://logs.glob.uno/?c=mozilla%23servo#c271807)

As for actual benchmarks, I think they would actually be quite easy to produce. There's exactly one line in the Rust stdlib that provides bounds checking for indexing, right here: https://github.com/rust-lang/rust/blob/master/src/libcore/sl... . All you would need to do is take out that line and compile Servo with your modified stdlib and compare the results of running the built-in benchmarks. I may just do this myself as a blog post. :)

acconsta · on Sept 24, 2015

OK, data definitely helps. It's not always easy to predict what modern CPUs will do!

For the record, there seem to be a lot of bounds checking independent of indexing:

https://github.com/rust-lang/rust/blob/master/src/libcollect...

kibwen · on Sept 24, 2015

You just linked to the same line four times. :P

acconsta · on Sept 24, 2015

Oh, oops. 471, 508 679, 766.

kibwen · on Sept 24, 2015

Given that these are all achieved with the `assert!` macro, we can fortunately just redefine the macro to be a no-op in order to determine the runtime cost of all assertions in the standard library (note that assertions that aren't required for memory safety should already be using the `debug_assert!` macro, which is in fact compiled to a no-op in non-debug builds). This will overestimate the impact of removing bounds checks (since we'll potentially be removing lots else as well), but I'm curious to see if the performance impact will still be negligible regardless.

acconsta · on Sept 24, 2015

Me too. I've been thinking about doing an analogous benchmark in C++ for a while.

steveklabnik · on Sept 23, 2015

Not with a flag or anything. As said above, if you use iterators, it's not generally an issue, or you can call an unsafe function that doesn't do the check.

blazespin · on Sept 23, 2015

The biggest flaw is programmers who ignore warnings.

nickpsecurity · on Sept 23, 2015

Didn't even know you had a paper on the subject. Going in my collection. I like how it mitigates so much risk with so little change or overhead. Good work. Paper mentions dangling pointers are a risk. They can be detected with other technology with even more in the works:

http://research.microsoft.com/en-us/um/people/marron/selectp...

So, out of curiosity, what's the odds of something like this proposal getting adopted in a standard, GCC, or anything else?

Animats · on Sept 24, 2015

It's a dead proposal. It was discussed on "comp.std.c" back in 2012, and that's the third draft, after fixing all the problems everyone had found. By that time, nobody could find a serious hole in it. While it's technically feasible, the political effort to push it forward is more than I can do.

Maybe if Amit Yoran was still at Homeland Security...

nickpsecurity · on Sept 24, 2015

Damn. Well at least you tried and anyone with more political might has something that might work in the future. That's how 3/4 of my designs work. ;)

ArkyBeagle · on Sept 23, 2015

That's no bug. One can use no array arguments and write perfectly safe code. MOST of the "this is unsafe" examples are just bad code and have been from the beginning. Doctor, doctor, it hurts when I do that...

It's an axiomatic choice inherent to the language. You add a parameter representing the extent of the array, and respect that.

I have no heartburn with Rust, but there's sure a lot of legacy 'C' out there.

meesterdude · on Sept 23, 2015

I often feel a bit guilty when I spent time looking through something instead of just giving it a quick glance; but knowing that Linus has to fight the same fight really gives me confidence that it's better to know whats going on than to just ship code out the door.

Someone1234 · on Sept 23, 2015

That was actually darn polite for Linus.

mikekchar · on Sept 23, 2015

It's really quite a good post, though I suppose it is embarrassing for the people involved. It spells out clearly what the problem is (i.e. what bugs have been introduced and what are likely to be introduced), what actions he wants changed in the future (i.e. don't ignore new compiler warnings) and says that he is not satisfied with the level of quality. He even indicates specifically which people he wants to pay particular attention to the issue.

That he can get away with saying these things shows that he has built a good team. I know I am not alone in experiencing the frustration of pointing out problems only to be greeted with, "Yeah, whatever. That's just your opinion." Or conversely having people complain vociferously about something without knowing the actual problems it will create (very often it is merely "You aren't doing it my way!").

I think the important thing is that, no matter how OTT Linus's posts can seem to someone outside the process, he has built trust. His complaints are very nearly always, "You are going to break X" and almost never "You aren't doing it my way". Even if it is difficult to understand, the people working on the kernel trust him and will accept the criticism.

Just musing here, but I think that if you want to build something of quality, you need a person on the team who is willing to be critical of failure. But that person has to engender trust in the rest of the team. It's quite a difficult balance because nobody likes to be criticised (or have their work criticised). Getting it right is hard and is probably the biggest failure I see of people trying to make the jump from being a good solid dev to being a good leader of people.

jordigh · on Sept 23, 2015

Yeah, "bovine manure" instead of "bullshit". He censored himself!

fit2rule · on Sept 23, 2015

Yes, I was going to say something along those lines too .. seems our Benevolent Leader has gotten a bit soft. Frankly, the use of array function arguments in code like this would set me off into a minor rage .. I'm impressed at Linus' restraint this time around.

Whats interesting though is that we're still dealing with these kinds of issues, even now in the 21st Century, from C programmers who should know better. Seems that the more things change the more they stay the same ..

exelius · on Sept 23, 2015

No, the C programmers who should know better have retired. Everyone else is just trying to patch code written 10-15 years ago while modifying as little of it as possible...

pjmlp · on Sept 23, 2015

We still have UNIX so it won't change.

vmorgulis · on Sept 23, 2015

Thorvalds still runs Fedora (22) and the GCC warning is a new one coming from clang.

https://lkml.org/lkml/2015/9/3/494

https://lkml.org/lkml/2015/9/3/499

zwp · on Sept 23, 2015

"I tried - and failed - to come up with a reasonable grep pattern"

Is there a nice, light, better-than-betterthangrep[1] static analysis tool that would help with this sort of question? (I suspect the decay to pointer would, for example, elide this detail from llvm's IR?).

[1] http://beyondgrep.com/

klodolph · on Sept 23, 2015

Create a new compiler warning. My guess is that it wouldn't be that difficult to add: add a warning for using sizeof() on a value whose type is the result of array-pointer decay. If you really want the size of the pointer, then you can change the function parameter's type to the pointer type, or use a cast.

nitrogen · on Sept 24, 2015

There is/was a refactoring DSL for parsing and modifying C code accurately, but I don't recall the name. If I remember correctly, it was produced by a French organization or university.

krajaratnam · on Sept 24, 2015

You're probably thinking of http://coccinelle.lip6.fr/sp.php?

nitrogen · on Sept 24, 2015

Yes, I think that's right.

chris_wot · on Sept 24, 2015

Clang plugins are pretty cool. The LibreOffice project uses them reasonably extensively.

anthay · on Sept 24, 2015

I've made this mistake in the past. In C++ I used a reference to fix it like so:

    #include <cassert>

    typedef char array_xyz_type[42];

    void test(array_xyz_type & a, array_xyz_type b, char c[42], char (&d)[42])
    {
        assert(sizeof(a) == 42);
        assert(sizeof(b) == 4);
        assert(sizeof(c) == 4);
        assert(sizeof(d) == 42);
    }

    int main()
    {
        array_xyz_type x = {1,2,3};
        test(x, x, x, x);
    }

I'd add these comments:

1. The reference makes the code somewhat less flexible in that you can't call the function with anything other than a char[42] type, e.g. test(new char[42]...) won't compile. (This may or may not be what you want.)

2. As Linus said, don't ignore compiler warnings.

3. Write some code, then watch every line execute in the debugger so you see if what you wrote is what you thought you wrote. Repeat.

pcvarmint · on Sept 23, 2015

C does have pointers to arrays:

  static bool rate_control_cap_mask(struct ieee80211_sub_if_data *sdata,
                                   struct ieee80211_supported_band *sband,
                                   struct ieee80211_sta *sta, u32 *mask,
                                   u8 (*mcs_mask)[IEEE80211_HT_MCS_MASK_LEN])

   for (i = 0; i < sizeof(*mcs_mask)/sizeof(**mcs_mask); i++) {
     // Use (*mcs_mask)[i]  or  mcs_mask[0][i]
  }

acconsta · on Sept 23, 2015

Does anyone know why array arguments decay to pointers?

syncsynchalt · on Sept 23, 2015

Because that's how the machine sees C arrays. It would take an extra (invisible) parameter on the stack to implicitly pass the array size into the function, a thing which is very much against the spirit of C.

The only real surprises/magic in C are things like:

  - addition/subtraction on typed pointers has an implicit multiply by sizeof(type)
  - type conversion between floats and ints

maximilianburke · on Sept 23, 2015

> Because that's how the machine sees C arrays. It would take an extra (invisible) parameter on the stack to implicitly pass the array size into the function, a thing which is very much against the spirit of C.

It would only require a separate parameter if you needed to pass arbitrary-sized arrays. In this example, if the prototype contains the array size:

    int foo(char array[30]);

It would not be difficult for the compiler to only allow arrays of that length to be passed as parameters, and no invisible parameters are necessary, assuming also that the function implementation made the same assumption of array length.

Developers who handle arrays of arbitrary length tend to already encode a length parameter, it's this case where it looks to the developer like they are encoding a length value that is the deceptive/problematic case.

mortehu · on Sept 23, 2015

This is one of the many uses of the static keyword in C99:

  void ten(char foo[static 10]) { }

  void generates_warning() {
    char foo[3];
    ten(foo);
  }

This causes clang to emit the following warning:

> warning: array argument is too small; contains 3 elements, callee requires at least 10

merlincorey · on Sept 23, 2015

The warning is true - it is "at least" 10:

    char ten(char foo[static 10]);

    char ten(char foo[static 10]) { return foo[1]; }

    void generates_no_warning(void);

    void generates_no_warning() {
        char foo[30];
        ten(foo);
    }

    int main() {
        generates_no_warning();
    }

Modified also so it would build cleanly with -Weverything -Werror

uxcn · on Sept 23, 2015

It avoids stepping on memory elsewhere in the program though. My personal preference for C99 and above is to pass the length as a parameter.

    int do_stuff(size_t len, int64_t arr[len]);

It still suffers from the pointer decay problem, but it at least forces whoever is reading/writing the code to acknowledge the length.

uxcn · on Sept 23, 2015

How does that parse? I thought static was only a storage class.

syncsynchalt · on Sept 23, 2015

When extending the language, the committee will often try to re-use existing reserved keywords to avoid colliding with a variable name in someone's code.

Thus we have yet another overloading of "static" in C, the use of "auto" in C++11 (which worked out quite well), and interesting type names such as "long long" in both.

When the committee breaks this rule (think nullptr in C++11) they first must choose a name which is suitably rare (and often a bit clunky as a result), they then scan for that name in large code corpuses, and evaluate how much work they are making for the maintainers of any codebase that's going to be compiled against a new compiler (and which may need to be modified to do so).

Some languages (I'm thinking JS specifically) have keywords that are not in use but are still reserved in case they may be useful in the future.

geocar · on Sept 23, 2015

> It would not be difficult for the compiler to only allow arrays of that length to be passed as parameters

It would be extremely difficult for the compiler, without changes to the C spec and most ABIs, to simultaneously provide that protection for heap-allocated objects, for example:

    foo(malloc(30));

> Developers who handle arrays of arbitrary length tend to already encode a length parameter, it's this case where it looks to the developer like they are encoding a length value that is the deceptive/problematic case.

The most common arrays of arbitrary length in C programs are almost certainly C "strings" which are usually 0-terminated instead of including a length value.

Athas · on Sept 23, 2015

> Because that's how the machine sees C arrays. It would take an extra (invisible) parameter on the stack to implicitly pass the array size into the function, a thing which is very much against the spirit of C.

In this case, the size is specified as a compile-time constant, however, which seems like it could work. All it would do is change the semantics of sizeof(), really.

Also, I am doubtful whether it would really be a problem to have "invisible" parameters. C doesn't even specify a stack as I remember it, and compilers are certainly free to pass as many function parameters in registers as they are able to.

syncsynchalt · on Sept 24, 2015

True, someone else pointed out that this could all be done as compile time checks, which would have no runtime/ABI impact.

There are interesting stories from the early days of C when it was first codified that say a lot about the whys and hows of the language that we have today. In particular I'm thinking of the fact that the compiler and what we now think of "compiler warnings" came from two different warnings (cc vs lint). This is because the C compiler was the largest C program and generally took every bit of memory available to do it, there was no room for any kind of sanity checking on that code. I've heard it said that to include a new feature "n" in the language, they generally had to rework at least some of the existing compiler code with feature "n-1" to simplify/shrink it enough to fit the new feature.

In that kind of environment you get a language with very little input checking, beyond the rules of the grammar itself (which was as small as possible).

geocar · on Sept 23, 2015

It is an ABI change.

acconsta · on Sept 23, 2015

But the size of C89 arrays is fixed and known at compile time. Why would the length need to be passed?

klodolph · on Sept 23, 2015

It's not necessarily known by the caller, since the array might be declared with an unspecified size and defined in a different translation unit.

acconsta · on Sept 23, 2015

"might" is the key word there though. What if the array has a statically known size, like the code sample in the email?

I suppose it could just be for consistency.

GuamPirate · on Sept 23, 2015

Well, the information isn't copied... so if you modified the incoming array you modify the original, so in that sense it has to be a pointer. Arrays and pointers are analogous in C except when they are declared, as there is an implicit allocation of information. Since we don't want this when it comes to array arguments, it simply decays.

doppelganger27 · on Sept 23, 2015

In C (and in all other languages I can think of), an array is actually a pointer to a bunch of memory. Indexing into the array calculates offset from the array pointer, and grabs the item at that offset.

36erhefg · on Sept 23, 2015

Array is not a pointer. Your description of indexing is not correct, neither is the phrase "array pointer" you invented. The machine code produced will calculate the offset differently for a pointer and an array.

doppelganger27 · on Sept 23, 2015

Let me clarify with an example. Suppose you have an array:

  int a[10]

There is a section of memory of size sizeof(int) * 10 somewhere, and the variable a is a * int that points to that section. When someone does this:

  int x = a[2]

It is equivalent to:

  int x = *(a+(sizeof(int)*2))

When I said "array pointer" in my previous comment, all it meant was the memory address that the variable (which is an array) points to (in the example, the variable "a")

Edit: missing paren

36erhefg · on Sept 23, 2015

a is a label, or an alias of a memory address. It is not a pointer. ( Your third example has a mistake, the correct increment is: a+2, because pointer arithmetic increments in object size not byte size )

Read this:http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arr...

doppelganger27 · on Sept 23, 2015

Interesting read. Thanks :)

nickcw · on Sept 23, 2015

FYI the equivalent Go code would actually copy the whole array on the stack, and so it would be valid if possibly inefficient depending on the size of the array.

ashearer · on Sept 23, 2015

...and for anything larger than a small fixed-size array (like the example), you'd avoid data-copying overhead by passing a slice, which is a bounds-checked bundle of pointer and length.

esaym · on Sept 23, 2015

Because as Linus stated: "array arguments in C don't actually exist"

The bracket notation ex: myarray[2] is actually just a sugar syntax for: * (myarray+2)

An array is nothing more than an address and a (hopefully) know length. Not really a "data structure" or object by no means.

You could definitely roll your own though with something like:

  struct  {
    char * arr,
    int size
  } my_struct;
  
  for(int i = 0; i < my_struct.size; i++){
    printf('%x', my_struct.arr[i]);
  }

  ect,

acconsta · on Sept 23, 2015

Yeah. But why is that the behavior specified by the standard?

pjmlp · on Sept 23, 2015

Because C designers decided to ignore other saner systems programming languages and thought it was a cool idea to do that.

No, C wasn't the first systems programming language used to write OSes.

augustk · on Sept 23, 2015

Because the compiler gets the length of an array from its declaration. However, inside a function the compiler cannot tell the size of the actual array parameter.

acconsta · on Sept 23, 2015

If I write:

void example(int a[10]);

Why can't the compiler tell a is an int array of size 10 instead of a pointer?

36erhefg · on Sept 23, 2015

Because a is a pointer and not an array. That is by design.

When you pass an array to this function you are not copying the entire array, C is pass by value, therefore you must pass a pointer. In fact a behaving like an array would be lying to the programmer since you didn't make a copy of the array.

acconsta · on Sept 24, 2015

You can pass structs by value. Why not arrays?

36erhefg · on Sept 24, 2015

Your answer is merely a simple online search away.

acconsta · on Sept 24, 2015

What's weird is five hours after asking that question, no one has provided a reasonable explanation of why C requires arrays to become pointers on function boundaries.

My "online search" found Walter Bright, who knows a thing or two about C compilers, wondering the same thing:

https://news.ycombinator.com/item?id=7572011

36erhefg · on Sept 24, 2015

Improve your searching capabilities.

https://programmers.stackexchange.com/questions/245602/why-c...

( https://stackoverflow.com/questions/7454990/why-cant-we-pass... )

And partially this thread.

As for Walter; Stop using C like you want it to be used, and instead how it is meant to be used. The article takes a strawman position. Anyone who takes their time and learns C, knows that arrays decay.

acconsta · on Sept 25, 2015

So it comes down to historical compatibility with B? Wow.

doppelganger27 · on Sept 23, 2015

There is no strict difference between a pointer and an array (of any size) at run time in C. Arrays are just pointers to a section of memory, and it's up to the programmer to communicate how big that section is.

36erhefg · on Sept 23, 2015

There are whole separate chapters describing arrays and pointers in the C Standard. This is by definition a strict difference. You should read those chapters and please stop spreading this beginners misconception of C.

on Sept 23, 2015

[deleted]

36erhefg · on Sept 23, 2015

In any case, there are significant differences at runtime as well. Look at the link I posted at your other comment.

geocar · on Sept 23, 2015

sizeof(a) is sizeof(int*) in your example because of the C standard: That is to say this is a bug in the standard, and not a bug in any particular compiler.

acconsta · on Sept 23, 2015

Yes, but why does the standard have that bug?

geocar · on Sept 23, 2015

Probably because the compilers that were standardized had that bug.

zamalek · on Sept 23, 2015

In the age and day of static analysis, someone actually took the time to question the rudimentary analysis the compiler does?

It boggles the mind. I'm with Linus on this one.

chengiz · on Sept 23, 2015

What is the warning?

myth_buster · on Sept 23, 2015

It's in the subsequent response [0] from Linus.

[0] https://lkml.org/lkml/2015/9/3/494