Hacker News new | past | comments | ask | show | jobs | submit login
Deconstructing K&R C (2010) (learncodethehardway.org)
149 points by experiment0 on Jan 5, 2013 | hide | past | web | favorite | 128 comments

Shaw mischaracterizes K&R by making anachronistic assumptions about its intended audience and ignoring the context in which it was written.

When it was written, a beginning C programmer was most likely coming from a background in assembly and accessing the computer as a professional in the workplace or a student with substantial privileges. The intended audience was sitting near the cutting edge and was assumed to be sophisticated. Data validation could be left as an exercise for the reader in good conscience.

This audience is distinctly different from those who learned programming typing code from magazines and Shaw's current audience for whom he is simulating that experience.

Editorially, K&R has chosen to remain a slender tome. It has let others create fat cookbooks and "for idiots". Forty years on, Shaw criticizes the Wright Flyer by the criteria of Second World War aviation.

We don't hold K&R on a pedestal because of its pedagogical methods, but because of the power of the language it describes. The C Programming Language was a byproduct of creating a language.

Kernigan and Ritchie were programming. Their book is properly judged by different standards than Shaw's educational project.

None of which is to suggest that Shaw"s project may not achieve a comparable level of esteem

He mentioned that the intended audience was different, and even talks about it a little -- what he objects to is that people learning C are being pointed to it in a way that removes all of that context. Especially, it seems like holding K&C up as a paragon of "style" offends him.

That's what he means when he says it should be relegated to 'history.'

I'd submit that K&C is the fastest way to learn C if you knew nothing about C, or the best way to acquire the Zen of C, and that for those purposes it's still second to none, and its conciseness is a virtue ... but that Zed might be right about people pointing to it as a paragon of style to newbies who hope to write it professionally someday.

In fact, I'm disappointed that Zed let this project drop. Seemed neat.

you can always read and benefit from an old text without anachronistic mistakes, and any text written today will also be obsoleted quickly. it is usually better reading the masters than the pupils.

>> In fact, I'm disappointed that Zed let this project drop. Seemed neat.

I suppose it's not very interesting to bash someone good work. I think that most important he said in "An overall critique".

I think Shaw's current project is much better. In part because it has a more positive agenda and therefore is freer to define itself - its hard to logically justify Learn Python the Hard Way based upon the perception that there are shortcomings with K&R.

> Shaw mischaracterizes K&R by making anachronistic assumptions about its intended audience and ignoring the context in which it was written. [...] Their book is properly judged by different standards than Shaw's educational project.

So you're basically conceding the main point: K&R is obsolete and unsuitable for education. I beg to differ. It's still a very good read that stands apart from today's clumsy, overwrought introductory textbooks. It can teach you a lot of tricks that are not easily found elsewhere, and does so at a speed that allows absorbing the whole thing in much less than a semester.

I said what I said, and I didn't say K&R was obsolete. Obsolescence of The C Programming Language is Shaw's claim. The aviation metaphor was intended to illustrate the anachronism underpinning Shaw's analysis. It is not a simile about technological progress.

What has changed is the size of the audience for C language learning materials. Until GNU/Linux, obtaining a C compiler required substantial effort for a typical computer user with AmigaDOS, MacOS, or Windows 3.x and 9600 baud bandwidth.

K&R uses examples to illustrate points. It was never intended to teach the art of computer programming. It recognizes that there are better resources for that - though I suppose someday somewhere someone will criticize Knuth for not providing a psuedo-code compiler on his website.

So you're basically conceding the main point: K&R is obsolete and unsuitable for education.

I disagree. I think OP was just saying that sometimes you might need more than one book, given that the original audience had pre-existing knowledge that tended to fill in the gaps. This might appear to be unsuitability "for education" within a popular marketplace of "Learn X in 24 hours" books, and make no mistake that this is the marketplace that Shaw is going after, so it's really a quibble about the level of expertise of your target market. "Rubes can't learn C from K&R." Yeah, well they never have.

If you also look at this metaphor

>Shaw criticizes the Wright Flyer by the criteria of Second World War aviation

it's pretty clear that the OP considers K&R obsolete and unsuitable for today's needs. It's not just that "sometimes you might need more than one book" - you wouldn't want to put the Wright Flyer in the same formation as a Flying Fortress.

K&R can teach you a lot of tricks. It also fails to teach you good security practices. In today's networked world, if your code has any chance of being attacked, I would prefer you to be careful on security to knowing tricks.

Shaw mischaracterizes K&R by making anachronistic assumptions about its intended audience and ignoring the context in which it was written.

Zed does acknowledge as much but this is worth pointing out. For some reason contextual intent and "intended audience" are missed by programmers (who are typically stereotyped for giving answers that are a technical depth irrelevant to the recipients).

With that in mind, I think that describing the intended audience and context (as you and others in the comments have done) is a more valuable exercise. It makes K&R (or other book) accessible and applicable to a new generation. Such posts won't normally get much intention - they lack a certain rebellious and inflammatory flair we know and love so well...

I don't know that this particular case is so much "contextual intent and intended audience" failure as it is that K&R C is suggested, unilaterally, for anyone learning C. It's the bible, and you can't hand someone a bible and then tell them that page 37 line 23 characters 3 and 4 need to be swapped, or you'll summon Satan. They won't remember it when they read that line, nor after they've passed it, they'll just go ahead and summon Satan because the bible told them to. Multiply by 1000 suggestions to warn someone about C.

It's basically impossible to adequately warn people that K&R C is unsafe when exposed to the real world, so it's an unsafe suggestion in the way it's usually suggested.

I think he's also not giving enough emphasis to the technical realities of the time. When K&R was written, bytes were EXPENSIVE, tracking the size of a string that had to be longer than 255 characters using anything but end-termination would result in every string wasting precious bytes.

Everyone agrees that NULL-terminated C strings are the wrong thing to do now, this is why virtually every modern language (including Go, which is obviously very C-inspired) splits a string into data and length as separate entities (even if this separation is mostly hidden from the language user).

But when you only have a handful of thousands of bytes for an entire system, you just have to accept some amount of unsafeness as a practical reality. Should K&R be amended with a warning lest it mislead people working on modern systems? Maybe. But I don't think it is fair to code review it using modern thinking about the expense of different operations and the modern luxury of vast amounts of memory.

> we will be modernizing the code in K&RC [...] It will be more verbose, but it will be clearer

It's been at least a year since the article first showed up on HN, and the author still hasn't made good on this promise. It's actually hard to do, because verbosity and clarity are usually at odds with each other.

One of the main attractions of K&R C is exactly its lack of verbosity, its extraordinarily high content-to-length ratio. In a very short space you learn to do a lot of sophisticated programming, and most if not all code examples fit on one page or less.

Of course, optimizing for conciseness has its costs, as anyone who has debugged segmentation faults knows. So you avoid some of this shooting-yourself-in-the-foot that C is infamous for by using various crutches: add an extra argument for some functions, build your own Pascal-style strings etc. And if you pass in external input then you should definitely use some of them, such as strlcpy (which is actually preferable to the four-argument function that this article is getting to).

But there are also lots of cases where plain old strcpy will do fine, and for simplicity sake it's better to use it. I believe one of these cases is a learning experience in which you want to get the big story as soon as possible, and are willing to wait until later to get acquainted with the inevitable caveats and detours.

>> One of the main attractions of K&R C is exactly its lack of verbosity, its extraordinarily high content-to-length ratio. In a very short space you learn to do a lot of sophisticated programming, and most if not all code examples fit on one page or less.


The same goes for "Programming Erlang" by Armstrong.

On contrary: "Erlang and OTP in Action" is quite boring (in comparison with book by Armstrong). Definitely it's not a book for "enlightment" but for practice, sometimes dull: "Do it in such way, you don't need understand why, you'll get accustomed to it in future"

I think Zed is right to point these errors out - there are quite a lot of issues with edge case inputs and undeclared assumptions, that people definitely need to hear about.

That said, he appears to be dealing in absolutes too much. If you care about performance (and let's face it, if you're using C you do, otherwise you probably shouldn't be using C) then sometimes you can't handle as much error checking or error correcting as you'd like.

In games (where most of my experience is), it's common to have functions that are 'unsafe' by his definition, but that are hidden in compilation units and not exposed in the header, so that the programmer can control exactly where they're called from. If you have a limited number of 'gatekeeper' interface functions that are actually called from outside the module, and these either check/sanitise/assert on their inputs, then the internal private functions can safely assume that they have valid input and just run as quickly and as simply as possibly.

There are many other good reasons to use C besides performance.

For instance:

  - cross platform portability

  - predictability

  - reliability

  - long term stability / maintainability

  - low run-time overhead, fine grained control of memory

Don't forget the people.

Long-time C coders are among the best programmers around. It's hard to understand just how damn good they are until one has gotten a chance to work with one or more of them.

The good ones know all levels of the hardware and software stack they're working with. Coupling this knowledge with the raw power of C, they can put together amazing and resource-efficient software in very little time, yet without sacrificing maintainability, security, portability and the other factors you've listed.

These are truly the people who make the impossible become possible.

if this is true, then I'm going to take C more seriously. I have to move myself forward from TurboC programming.

It is true. How much of the truth is credited to "longtime" and how much is credited to "C" is up for debate, though. Personally I'd put most of it on the "longtime" side, and observe that if you want to learn how to write really efficient, performant code quickly, there are a number of other options developing where you can skip the part where you stab yourself with the language for five years learning where the sharp bits are. C is not dead yet, but I'm feeling increasingly confident we've finally entered the era where its days are now numbered.

I'm a long time C programmer, and in the embedded space, which is a burgeoning development area, C is indispensable. No libraries or external linkages, just raw close to the metal power.

News of C's demise are premature.

For the foreseeable future there will be application domains in which there will never be enough processing power to satisfy the subject matter experts that work in these domains. A very short list includes: atmospheric modeling, CFD applications, HPC simulations/optimization tools ... And that's in the science world. You also have OSs, DBs, web servers, HFT apps ... So, I'm not sure why you'd say that Cs days are numbered. Do you think the next high performance OS or MPI implementation will be written in some interested language?

based upon what? from what I've seen C is doing great, it has it's particular domains which in my opinion are low-level programming and performant, portable code with a small footprint.

I can't recall seeing a new language challenging C in the aforementioned areas.

"it has it's particular domains which in my opinion are low-level programming and performant, portable code with a small footprint."

I realize you're defending C, here, but I'd just like to use you for a moment to ask why this sort of thing gets stated over and over and over on HN.

C is good for low-level, systems software, embedded systems, and speed.

You know what else it's good for? Hundred and hundreds of desktop applications, userspace tools, web backends, and everything else.

It's like we keep saying it's for low level stuff, but thousands upon thousands of developers haven't got the memo. Who would dream of writing a media framework in C? The folks who do GStreamer, ffmpeg, MLT. User interfaces? Nah, it's for low-level stuff. Except, you know, GTK and Clutter. It's not for Web programming either, except maybe Apache and Nginx. You wouldn't check your email with it (mutt), or edit code with it (Vim), or make music with it (CSound, SuperCollider), or draw pictures with it (Gimp).

Maybe you're right, and that C is a bad language for all this type of stuff, but good grief -- there are a lot of developers shipping working, high-level systems in this language. And it's not like they haven't heard of Haskell or Java or Python -- or C++. In fact, we might ask when Java or Python is seriously going to challenge C in these areas.

Agreed, but I was mainly focusing on the aforementioned domains in which C is the 'de facto' standard due to it's properties.

However I believe there are lots of applications written in C because that was either the 'language d'jour' back when they were written or that the author was very proficient in it, which could be rewritten in something like Go,Java,C# or even Python and similar without any perceptable loss of performance.

Looking at things where C dominates to this day and show no sign of weakening, I see operating-system level code (kernel, driver, userland<->kernel interface code), libraries for cpu intensive workloads (audio/video codecs, compression/decompression, low level game framework components etc) and of course cpu intensive applications (encoders, archivers, graphics manipulation software for 2d/3d etc), I guess I should squeeze in version control software here aswell :)

However, there exists a wide range of applications outside of these areas where I think C isn't particularly competitive these days as it's strengths are dwarfed by it's weaknesses.

That's not to say that there is anything wrong with writing such applications in C, particularly considering aspects such as proficiency and familiarity, if you know C very well and feel very comfortable programming in it then it's less likely you'd want to switch to something else for 'convenience'.

Based upon the increasing number of languages showing up that combine high performance with higher levels of abstraction than C, and the increasing number of serious languages gunning for C specifically, like Rust.

C has ridden for a long time on the fact that we didn't know how to combine high performance and systems functionality with high abstraction languages, so you had your choice of C or C-like and fast, or high abstraction and slow, like Python or Perl or Ruby. This gap is closing, fairly quickly now, and once it does C will start experiencing a rapid decline into a niche language, rather than the lingua franca of computing. It has advantages that have kept it going for a long time, but it has terrible disadvantages too, and once the two are separable, people are going to want to so separate.

Already a great deal of what was once automatically C or C++ has gone to some sort of bytecode (Java or .Net), even on things you would used to simply automatically assume to be "embedded", like your cell phone. The decline has already started.

Of course it won't die. Computer languages never really die. You can still get a job doing COBOL, and in fact get paid quite well. But 10 years from now, I think on the systems considered "modern" at the time, it will not be the "default" language anymore.

Rust is the sole _potential_ C competitor I've come across but the language isn't even finalized last I checked so it hardly constitute a challenge to C as of yet. A new language showing up doesn't mean it will gain any traction.

As for the higher level languages closing the gap, that is certainly not my experience and I've seen no benchmarks to this effect. And I'm not talking about Python or Ruby and the likes which are in an entirely different category, I'm talking about C#, Java, etc. They are still alot slower on cpu intensive code, also they certainly doesn't fulfill other important properties of C like small memory footprint.

The notion that there was some 'magic piece of the puzzle' missing which has been solved in making high level languages perform like low level languages like C comes across as nonsense to me. Higher level languages sacrifice speed for convenience and safety, depending on your demands that can often be a very wise sacrifice, in the areas where C excels it's likely often not.

Also cell-phones (which are really the upper-end of 'embedded') run a kernel and system-level code written in C doing the heavy lifting system-wise, also performance demanding applications (higher-end games) are often still written in native code on these phones.

There's certainly room for a more 'modern' C, maybe something else has replaced C in 10 years, but I don't see it being any of the languages we are discussing today, not even Rust. As it stands, I think C will remain dominant in the aforementioned domains.

Guess we are going to have to disagree, history will prove one of us right I suppose :)

ATS is far better than Rust and C.

I don't know, Rust has Mozilla behind it, and lots of active development and (for a new programming language) a pretty decent community.

ATS is hosted on SourceForge, and looks to be someone's research project. That's totally fine, and it may actually be _better_, but the real world often doesn't care about better.

If I was a betting man, I'd be putting chips down on Rust. (and I sort of am...)

That said, the more programming languages the merrier!

Rust may be more successful but at this time ATS is a better C replacement. Mainly because Rust is currently tied to a runtime and garbage collector. But also because ATS has better features for describing the API boundaries when calling into C and back (ie. FFI).

Rust will grow into these areas I'm sure because Mozilla will need them for Servo and safer programming.

Seems fair.

Care to motivate? I am curious, I don't know ATS

The positives about C:

It's high-level assembly language. Thus, it allows you to write code with a very small barrier between you and the machine instructions, but with enough of a barrier that you can reuse and organize your code logically. At the same time, it gives you most of the speed and raw bit manipulation of machine code.

The negatives about C:

You're almost as likely to introduce completely silent catastrophic memory corruption with every operation than you are to actually accomplish the task you're trying to do. Most of the things which would normally be bugs are also now major security breaches. C's structure also makes it extremely difficult to do static analysis, making it difficult to find bugs, security vulnerabilities, and make performance optimizations.

How ATS helps:

ATS addresses a lot of these issues by emitting efficient C code. However, at a higher level ATS combines C with an extremely strong type system (dependent typing) which allows you to do things like verify statically that there are no out-of-bound memory accesses through typing. It can do even stronger things like prove your code correct.

Here's an example of how ATS makes a C API safer, similar to the API in the K&R example (it deals with memory/string copying): http://www.bluishcoder.co.nz/2012/08/30/safer-handling-of-c-...

ATS provides all the low level hackery that C can do but allows more information to be encoded in types to ensure that is safe. It also provides high level features from functional programming languages like higher order functions, pattern matching, etc.

Any new language will have a maturity gap to make up for that will be hard to bridge.

Java has to some extent displaced C++ (and the .net framework has done some more of that), but I haven't seen any language that displaced C in an arena where C is strong.

For rust to make this happen they'll have to finalize the language spec, gain a decade+ experience in what the quirks are (these things only come out over time it seems), sway a generation of programmers to adopt it rather than the incumbent.

Go is shooting for the same space and it already lost the plot in several aspects (for instance: newer Go versions break older code).

Go is not shooting for the same space. Go is garbage collected.

Pre-version 1, there were no promises that your code would not be broken. In fact, quite the opposite is true. It is hard for me to classify this as a failing of Go when it was exactly according to plan. Now, if future versions break version 1 code...

Acknowledged, however I also think that Cs inertia will keep it going for a long time. Because System X was written in C, the next iteration will be written in C, goto 10. Also, for scientists (and others, I'm sure), it is hard to convince them to learn a new language when they are comfortable with their current language and are not convinced of the benefits of switching.

If you're going to take C more seriously, then I suggest you take Lua and the Lua VM more seriously, too. The reasons why:

* You can put the LuaVM in any C project, quite easily. * The Lua VM is exceptionally easy to understand C, and is highly portable to boot (extreme platform plasticity), thus: a great project to learn from * The field of scriptable VM-hosting has a bright future in software development

Not to mention:

- lightweight ABI, accessible from just about any other language on the planet.

- huge existing base of liberally-licensed code available for re-use

- basically the only language that is not a) legacy-stamped or b) bloated beyond learnability after more than two decades of use (or, in C's case, twice that.)

I seldom use C myself, but its strong points are undersold by the "garbage collection is to slooooow" kiddies.

Though it might come across as 'dealing in absolutes,' Zed is making the broader point that when one starts learning programming, it's not good to put any book on a pedestal and blindly follow their style - without working through the code.

It's similar to literature where people tend to ascribe the word 'classic' and yet rarely bother to read, understand or appreciate them.

Your example of unsafe functions might work for you but might not be the best example in a book which is being revered as "the classic book on C" and which influenced almost every major language and book thereafter.

As an interesting analogy, some great thinkers have heavily criticised Shakespeare for some terrible imagery - notably Wittgenstein and Tolstoy.


I agree - that's why I started by saying that Zed is right to point them out, as I realise that he is teaching people new to C (and potentially to programming). I added my caveats just to point out to the slightly more experienced programmers here that it's not quite the whole story.

I'm basically replying to this comment so I can find it again.

I think your point about unsafe functions being fine as long as access to them is only possible through controlled access points that verify sane input is one of the most interesting points I've seen in a while.

This is exactly my experience working with engine code--there are some very clean, fast, precise, and horrifically unsafe functions in our rendering system and other places that are acceptable because we can make guarantees about what data gets there from elsewhere.

The mental image is an access panel, behind which lay hundreds of whirling razor-sharp gears--presumably if you open it you know what you're doing and are careful.

This is what people pushing for abstraction are about, and why they're correct in some cases.

    // use heap memory as many modern systems do
    char *line = malloc(MAXLINE);
    char *longest = malloc(MAXLINE);

    assert(line != NULL && longest != NULL && "memory error");

    // initialize it but make a classic "off by one" error
    for(i = 0; i < MAXLINE; i++) {
        line[i] = 'a';
So, you create something that does not fulfill the C library invariant of what constitutes a "string", and then pass it to a copy function that assumes this invariant? It isn't a fair thing to do, and frankly, I doubt it many beginner programmers care about things like this. Yes, they may run into such a "defect" and be very miserable for a while, but that will just teach them about debugging, and most important, invariants.

Zed, I appreciate your work, but if this is the direction you'll be taking with these articles, then don't bother.

> So, you create something that does not fulfill the C library invariant of what constitutes a "string", and then pass it to a copy function that assumes this invariant?

From the article:

> Some folks then defend this function (despite the proof above) by claiming that the strings in the proof aren't C strings. They want to apply an artful dodge that says "the function is not defective because you aren't giving it the right inputs", but I'm saying the function is defective because most of the possible inputs cause it to crash the software.

Zed's point is not that K&R is bad because their example code doesn't match C library invariants, he's saying it's bad because it encourages people to write code that blindly assumes C library invariants will always hold.

Certainly, if you're going to write C code there's some things that really do require blind trust (for example: that your code will be compiled by a conformant C compiler), but "all strings are safely null-terminated" is incorrect so often, and the cause of so many historic security vulnerabilities and crashes, that perhaps we shouldn't be encouraging new C programmers to do it.

So in a dynamically typed language, we then add type checks everywhere?

Because we wouldn't want a function to accidentally process data that wasn't meant for it, no?

The equivalent errors in dynamically typed languages don't lead to critical security vulnerabilities like they do in C. You get a nice exception and a clean crash, not a hook to grab root.

Realistically, just about every mainstream dynamic language today has its major implementation(s) written using C or C++. Some of them have significant amounts of C code underlying their common libraries, as well. Some of the most popular third-party libraries or modules are in fact nothing more than thin wrappers over existing C or C++ code.

Don't think that you're escaping C or C++ just because you're using Ruby, or JavaScript, or Python, or Perl, or Tcl, or even Java. Don't think that you aren't as vulnerable using a dynamic language as you are using C or C++ directly.

Furthermore, it is quite easy for dynamically typed languages to suffer from very serious security vulnerabilities. There was one affecting Ruby and Ruby on Rails widely publicized just a few days ago. You can read more about it at: http://news.ycombinator.com/item?id=5002006

But the beauty of modern programming languages is that the responsibility has gone from sole programmers to actual implementation developers. You would expect that people who implement the runtmies and libraries are far, far more knowledgeable, experienced and trustworthy. And it often is so -- the end user(programmer) can't be trusted to know all about security implications, possible vulberabilities let alone how to exploit them!

The whole idea that "Well it's insecure because you program in insecure way!" is outright idiotic, and should be killed. If that means getting rid of C and C++ and people who write in these languages(Hey, myself included!) then so be it. The faster the better. Sure, there are cases in which it's hardly possible, but I'd take 10 times slower computer for 1) fewer bugs 2) more advanced software 3) cheaper software 4) far less security concerns any day.

Now, I'm going to close the C++ project and go write some Python. Makes me feel happy and far less stressed, although C++ does damn well in comparison to C.

But using a language like Ruby or PHP, for instance, doesn't really lead to fewer bugs, or more advanced software, or cheaper software, or fewer security concerns in practice.

What it often actually leads to is untalented developers creating a lot of bug-ridden and vulnerable code extremely quickly. It's efficiency in all the wrong ways.

Do you remember that Diaspora social networking project that received a lot of hype a couple of years ago? It was a Ruby on Rails app, and the early releases were absolutely filled with some particularly nasty bugs and security flaws. The only reason they were eventually fixed is because the code was made public, and people pointed out these flaws. There is a lot of Ruby code out there, for instance, that isn't public, yet is still riddled with the same types of problems.

That's not to say that the same isn't true for Python, or Java, or C#, or C++, or any other language. But we shouldn't be claiming that using a language like Ruby or Python somehow leads to more secure code. It doesn't, and it's dangerous to think that it does.

> But using a language like Ruby or PHP, for instance, doesn't really lead to fewer bugs, or more advanced software, or cheaper software, or fewer security concerns in practice.

Which domain are we talking about? Of course web-based applications have their own problems, but imagine if idiomatic Ruby or PHP code was vulberable to buffer overflows, double-free/use-after-free or format string vulnerabilities on top of the current problems, would you still say that languages don't matter when it comes to software development problems and issues? Essentially, what you're saying is that modern programming languages aren't any more better in practice in said regards than C. Honestly?

Of course no language can prevent outright bad code, but a language, by it's design, can eliminate issues related to for example type safety and memory safety. Consider Rust as an example of this. What this means in practice is that the language by it's design manages to eliminate these issues. Code is less prone to bugs and has no security concerns related to these issues. More time for validating correct behavior and fixing misbehaving parts.

> But we shouldn't be claiming that using a language like Ruby or Python somehow leads to more secure code. It doesn't, and it's dangerous to think that it does.

What on earth am I reading? What are the equivalents to buffer overflow or format string vulnerability in Python or Ruby? How do I execute arbitrary machine code with Python or Ruby if malicious input is given to the vulnerable program?

Still, there is a whole class of memory errors / exploits that you can stop caring about once you have managed code. The tradeoff is obviously performance. Although as java/.net show us, not necessarily too much of it.

>I'd take 10 times slower computer for 1) fewer bugs 2) more advanced software 3) cheaper software 4) far less security concerns any day.

Fair enough, but as soon as you build your slow app the competitive market will want to buy the version that runs 10x faster. In some cases it won't matter, but where the software has to run in real time it very much does. There's no escaping C.

Sure, there's no escaping C, but that's mainly because of the investments we've put into it. Same goes for C++, but it's far more manageable from security point of view. Today we have modern languages which are only a tad slower than C, yet which guarantee safety and control. See Ada2012 for example. Also optional unsafe/unmanaged code blocks can really help with maintaining performance in critical parts while keeping the non-critical parts safe/managed. This goes as far as optional garbage collecting for certain objects and manual for others. Flexibility, none of which C provides and which makes C a horrible language in todays world.

If only someone would start re-writing de-facto low-level infrastructure such as kernels(say Linux) and userspace tools and programs(servers such as apache, implementations such as for Python and Ruby, libraries, ...) in something like Rust or equivalent which guarantees both type and memory safety and has strong emphasis on concurrency and encourages immutable state etc.

Maybe one day we simply don't have to care so much about what's "secure" and what's "vulnerable". Because the concept of software vulnerability is destructive. Yes, it employs people, but these people create no real value, they just fight the destructiveness of vulnerable software. They are worthless in ideal world.

There is a very important concept in software called "good enough". Sure, there are language that could be better than C/C++, but 90% of what see in a desktop is written in C or C++. For a bad language, this is good enough, I would say.

It would be different if we had a language with the same performance characteristics and dramatically better high level features, but to this point we still don't have it. That is why software developers are in no rush to move from C to another of the languages that have been discussed lately.

I would rather attribute the current pace of things and C and C++ popularity to what is invested in those languages. Tons of big projects are written in C and C++. Many of them were begun during times when performance was a major issue unlike today. Also the ability to find contributors for a C and C++ projects is going to be far easier than for projects written in say Go or Rust or any other relatively suitable language. Not to talk about libraries even.

For a typical new desktop application, C and C++ have been long dead for at least a decade now, thanks to C# and .NET. It's a tad different on Linux and Mac though.

If we were to start from a scratch, I'm sure C wouldn't have such popularity as it had 20 years ago. The language is inferior by it's design on modern standards. Yes, there are domains where it's still relevant, but consumer PC(or let alone mobile) is not one of them. If C were relevant, I'm sure we'd rather write mobile apps in C instead of say Java.

  > There was one affecting Ruby and Ruby on Rails
What has it to do with Ruby, besides the fact that RoR is written in Ruby?

Why not use a modern statically typed language instead?

>Don't think that you're escaping C or C++ just because you're using Ruby, or JavaScript, or Python, or Perl, or Tcl, or even Java. Don't think that you aren't as vulnerable using a dynamic language as you are using C or C++ directly.

Actually this is completely backwards.

Think exactly that you are NOT AS vulnerable as using C or C++ directly.

That it's C/C++ underneath has little importance.

It FAR MORE difficult and FAR LESS common to reach runtime/interpreter bugs that to produce bugs of your own in the higher level language.

>Furthermore, it is quite easy for dynamically typed languages to suffer from very serious security vulnerabilities.

Of a different kind, that doesn't pertain to the current discussion.

So they have infinity minus one problems. You still have critical systems crashing or silently misbehaving.

Does the absence of type checks in dynamically typed code result in remote execution vulnerabilities typically?

It's twenty goddamn thirteen, stop writing trivial buffer overflow vulnerabilities already. And that applies whether you are a novice or an expert.

When you use a dynamically typed language you have CHOSEN to live without those concerns -- you traded checking for flexibility.

In a statically typed language, where you already made the effort of using types, it would be a shame for your program to die or cause havoc because you didn't also think about enforcing some invariants.

Plus, those kind of errors in C can cause buffer overflows, privilege escalation and such.

In a dynamic language it's usually just a halt and a stack-trace.

(And actually there is a move to do exactly what you say --add type checks in dynamic languages. See Dash and Typescript, with their optional type annotations and checks. But if it was possible to optionally have those checks by pure inference, without adding anything to a dynamic language's syntax, most people would jump at it instantly).

You are being too kind. I was expecting some earth shaking discoveries, only to find the author doesn't understand C.

I concur, I wonder honestly whether this is a matter of hubris becoming more important than accuracy.

The accurate fact is that K&R C is a book about C. It is not the end-all, but rather an introduction to the language. Sure, it has thorns. Sure, you'd be a fool to adopt the style from it; this speaks more of the culture of its readership than the book itself, however. The authors are very honest that their samples are an attempt to engage the readers attention in the Language; especially the Ingenue, new-comer, non-Professional C programmer.

To that end, the book succeeds; new C programmers get an introduction, a light read, a good set of nomenclature to understand the topic further, and so on. It is not intended, in spite of the cultural proclivity towards these things, to be "A Bible of C".

And if it were, no professional C coder worth their salty words these days would be without the New Testament, right alongside K&R on the neglected end of the bookstack, which book is of course: "Advanced C Programming - Deep C Secrets" which explains rather a lot more about the thorns of Professional C, and more, in an equally comfortable manner as both K&R, the Authors of C as well as books about C, have done.

In my opinion, Peter Van Linden has already done to K&R what Zed doesn't seem to have the humility to do; proven its value to the newcomer in becoming one step closer to a professional.

agreed. Once you know C, deep C secrets takes it to a whole new level and is an great body of work. My 3 C books are

1- K&R 2- C Traps and pitfalls 3- Deep C secrets.

Zed and his writings are not to be taken too seriously.

I suspect this one is a diversionary tactic.

I guess Zed Shaw suffers from nerd burnout. As in, a sort of more emotional burnout from having had to deal with them all the time in the past - or at least, that's what I get from some of his writings anyway. So I imagine him popping this stuff in as a sort of early warning system. It's all true enough to be right, and true enough to get his point across, BUT IT'S NOT TRUE ENOUGH FOR A NERD. So any time somebody complains about his strcpy example, or 0-terminated C strings, or whatever, that's his nerd alert. This person is not worth dealing with, and now he can block them, or set up a mail filter to put their email in the junk folder, or whatever, without having had to invest any time in finding this out the long way.

There was also a bit in one of his essays about the way ruby fans were always these stupid armchair pop psychologists.

I think it functions less as a "nerd" alert but more as a "recognizes I may not be as profound as I like to imagine myself" alert.

I mean, he's right.. but he's not being profound. You may as well tell me that I should be careful about losing precision while using floats. Yes... no shit?

Yeah, when your entire career and persona is built around being the only intelligent person in an industry full of idiots, it is natural to need to drive all the knowledgeable people out of your personal space.

"C is a ghetto"

I'm not trying to be "kind". I usually like reading Zed's technical work and just express my opinion about this particular piece. I agree that debating the theoretical provability of correctness of programs may be interesting in some academic courses, but using it to bash K&R for their book's contents is, as I said, unfair.

He certainly understand C programmers well enough to understand that when they write code like the one he critiqued, it's like handing a kid a loaded shotgun.

It will lead to misery.

Yes, we are all sure you understand C better than ZShaw.

I expect you will point us to your prodigal output in the language, and that it was only by accident that you forgotten to add any points of critique in your comment.

"if this is the direction you'll be taking with these articles, then don't bother"

But this thread on the book is a pretty good resource on issues and gotchas for newbies.

I largely use Python, I've dabbled in C and always mean to learn more. K&R to me is the touchstone for that, just because so much of programming culture stems off of it. I find knowing these historical patterns helpful for understanding how programmers talk.

For someone like me, respectful critique of its style and decisions helps separate the good from the less-helpful. Whether Zed intended to start that discussion or not, it's still helpful.

>So, you create something that does not fulfill the C library invariant of what constitutes a "string", and then pass it to a copy function that assumes this invariant? It isn't a fair thing to do

Well, life as a program is not fair either.

The problem is your function can be used in many contexts, including by other people. You should not expect them to be fair, you should make your function robust.

There is no reason to assume that the correct length parameters would be passed to safercopy. Indeed, there are many buffer overflows and off-by-one errors in C programs which involve buffers with explicit size values instead of null-terminated C strings.

The real problem with C is that it relies on bare pointers, where it would have been better to use slice-type structures that describe a buffer by pairing the base pointer and size, so that they are naturally kept in sync. This article takes a lot of time to "deconstruct" C strings, but never gets to the real issue.

The "stylistic issue" is also debatable. With the indentation given in the example, nobody would think that the "while-loop will loop both if-statements", as the author claims.

In short, Zed had all of Rob Pike's sneering anger and none of his technical mettle.

What has Zed said about C that wasn't already answered more thoroughly by Go?

    A = {'a','b','\0'}; B = {'a', 'b', '\0'};  safercopy(2, A, 2, B);
    A = {'a','b'}; B = {'a', 'b', '\0'};  safercopy(2, A, 2, B);
    A = {'a','b','\0'}; B = {'a', 'b'};  safercopy(2, A, 2, B);
    A = {'a','b'}; B = {'a', 'b'};  safercopy(2, A, 2, B);
This analysis only tries different values of A and B, not the lengths. A proper analysis of "for what values does it fail" should include all parameters. What happens if you do `safercopy(3, A, 2, B)` or `safercopy(3, A, 3, B)`?

Exactly what I noticed. The "safer" function relies on the length arguments being correct, just as the copy function relied on the strings being null-terminated. The safer function is more explicit so less prone to error, but both rely on the programmer doing the right thing, which is what he was trying to avoid...

The good part is that valgrind will hopefully find that it read outside bounds immediately, as opposed to code depending on correct string termination which will work with correct test cases.

Zed is dinging examples in K&R for incorrectness, but he's inadvertently deconstructing the notion of correctness. It's more contextual than most developers want to believe.

Agreed. And it starts from his very first example, about the implementation of the copy() function. C is not a type safe language and programmers who use it knows that, sometimes it might even be an advantage. For instance for me, if I'm writing or using such a copy() function which does not ask for a length to copy, I know it is because it will need a proper C string. For some others it may be that they know that their functions will use a global maximum length and that the inputs they provide to the function are respecting that. But no half-decent C programmer will use this function as a totally safe one on any possible inputs. Especially if they saw the implementation.

The "correctness" the author is asking for here is not what you want from a typical C functions. If you really need this kind of "correctness" then maybe you are using the wrong language and should check our either a high-level tolerant scripting language or a statically typed one.

Yes, and that is just one level. When you write about programming you have to be very clear about what you are trying to get across, and when you put that focus into practice other things suffer because you are competing against the rest of the world for the reader's attention.

Kernighan and Ritchie could have decided to write the book as absolute hard-asses, making the most bullet-proof copy routine imaginable, but in the end they would've been writing a different book, not a primer for a language.

If you're illustrating concepts — such as iterating through an array to find a specific value which will be there because the data structure being examined is defined to contain it at some point, then writing a function that assumes the value won't be there is didactically bizarre.

Zed is suggesting that the teaching point of the code sample be ignored in favor of an altogether different and less useful teaching point. This is not a case of not understanding C (which he cleary does understand) but not understanding how to teach.

At best, his entire point could be addressed in K&R by a cautionary footnote or a later discussion of code hardening.

the point of the exercise is to understand how a copy works, and stylistic issues aside, the point is made.

if you supply a function with inputs outside of it's specification (NULL-terminated strings), then undefined behaviour is (by definition) going to happen.

besides, what's to stop someone from calling safercopy like so;

    safercopy(strlen(str1), str1, strlen(str1), str2);
then strlen will fail (albeit a bit more safely - perhaps).

it's a safe bet, that in production code, we'll not be working with fixed length strings. so we need to get the length of the string somehow. all his safercopy does identify a problem that he himself already points out is impossible to solve - how do we differentiate from a NULL-terminated string, and one that isn't?

the only real solution (i can think of) is a string class, where the constructor is guaranteed to return valid (or no) strings. then (assuming other functions can't overwrite our memory - already an unsafe assumption) we could guarantee a safe string copy.

programming is hard.

I am wondering when will the author take on correcting errors in MMIX code in Knuth's books. He will at least earn a hex dollar for his efforts.

Jokes apart, I admire the bravery in questioning K&R C's status. I have only a few personal insights to share as I don't code much these days.

1. I got introduced to C in 80's via some popular book. K&R not only taught me C but also was my entry point into the systems programing world. Before K&R, the BASIC programming books never allowed me to deal with memory or interrupt vectors in the way C allowed me.

2. C has a great power dynamic built in to it. You are on your own in dealing with this power. It's you who crashed the machine, not C and surely not K&R C.

3. Almost every language since C has borrowed something from C. Hence, anytime I saw a familiar notation or code block in any language that reminded me of C, I got confidence that I can learn this language.

K&R's many virtues give it an unique status. It did something that no C book or website can do it. It is the word of the language designer themselves. They shared their reasons of the choices they made.

Theres an analysis of his deconstruction here:


Even if I made a function copy(char to[], char from[], int lenTo, int lenFrom) it would still be incorrect using his reasoning because most of the possible inputs would still cause the software to crash. I could symply add a wrong lenTo and lenFrom.

Garbage in, garbage out.

I don't understand it. His solution to non-safe introduction to non-safe language is to slap defensive checks everywhere? I am not sure that scaring his students shitless by having them check every input is as effective as concise explanation why and when you should check your inputs.

Criticizing K&R because of their safety assumptions is a faux pas.

Though Shaw raises a valid point based on the fact that we program in more diverse and hostile situations today than K&R did back in their days, I'm unsure how deep one needs to go into program correctness when teaching a language. Wouldn't it suffice, for example, to have program correctness highlighted as a chapter?

Take this -

> Also assume that the safercopy() function uses a for-loop that does not test for a '\0' only, but instead uses the given lengths to determine the amount to copy.

It is possible to write a safercopy() conforming to that loose specification that will not terminate. Just make the test something like "i != length" instead of "i < length". Then you can supply negative length arguments and get it to fail. Of course, that would be stupid, but it already illustrates the art of specification. .... Well, with finite precision integers, "i != length" would terminate at some point due to wrap around, but would take the universe to end if you'd used 128bit integers. To do it even more simply, `safercopy(1,2,3,4)` can crash the program.

Is the moral of this story that programs are not valid outside the context they were created for? .. or is it to never use data structures whose integrity cannot be proved without failure? .. or is it that proving a program's correctness using some method only indicates a failure of imagination? .. or, to put it differently, that you can only prove a program wrong but never one right?

This article is bashing just for the sake of bashing. While it pretends to offer constructive critiques and solutions, it fails miserably. One can certainly have productive discussions about the proper way to handle buffers in C, but this isn't it. Zed Shaw should stick to Ruby or whatever area he's actually good at.

He did the same thing in Ruby first, before everyone got sick of him and he moved to C.

Annoyingly clever. Even while running Ito JavaScript I have to debate whether I should be annoyingly clever or not.

  /* bad use of while loop with compound if-statement */
  while ((len = getline(line, MAXLINE)) > 0)
    if (len > max) {
        max = len;
        copy(longest, line);
 if (max > 0) /* there was a line */
    printf("%s", longest);
What do you do?

(i'm not certain what aspect you are trying to discuss but i'm assuming braces-- as that seems the primary point in the article)

My personal solution (again for JS) would be to use a new line with no braces to split up an if-statement, but to never nest a braced statement as part a pseudo-one-liner, nor to nest many levels of one-liners - as these situations could lead to confusion.

Eg for the above;

  while ((len = getline(line, MAXLINE)) > 0){
    if (len > max) {
        max = len;
        copy(longest, line);
  if (max > 0)
    printf("%s", longest);
This is a personal preference- i find it adequately splits up a one-line `if(max > 0) printf("%s", longest);` statement to be clearly identifiable as an if (while/for/etc) block, without the verboseness of the extra line/2 for braces, which i personally find makes code harder to read.

If I'm intentionally writing a one-line if I will write it on one line. I think using an indented second line without braces is less clear, and more prone to problems later when the code is modified.

So I'd write:

  if (max > 0) printf("%s", longest);


  if (max > 0) {
    printf("%s", longest);
but never

  if (max > 0)
    printf("%s", longest);

I'm SO glad Zed has pointed this out! I dislike code like this and I dislike it even more when people say K&R style is the best, it's classic, believe/follow the creators of the language!

Fu that.

It's error prone, just like Zed says. Add braces, don't be lazy, it makes the code flow easier on the eye.

IMO, In JS, it's painfully dangerous to be "too" clever as well.

Sometimes I like to think about it like so: Will someone else "with less skill than me" be able to follow this code after me? Then again, being human I sometimes fail to think this way :(.

The overly clever argument is only valid if you think the point of K&R is to teach good programming style. In fact it's to teach you to read and write C programs. In many cases the learning point is achieved by the reader puzzling out why the code works and learning from this.

Complaining that this isn't easy enough is missing the point.

Arguing that the language shouldn't allow such constructs is again outside the scope of the argument. K&R designed the language, so presumably they agree with the design.

- if you hit a line longer than maxline you'll end up with a partial line in your 'longest' buffer

- if the line is so long that it will spill across several chunks of 'MAXLINE' length then you'll end up with the next to last bit

- you should probably use a character reader that uses realloc to resize the buffer area until even the longest strings fit or the program fails

- your getline invocation is wrong, the proper one is getline(&lineptr, &cursize, fp);

And the 'real' getline takes care of most of the above gripes, for instance all you'd have to do is to swap the pointer to 'longest' for a fresh one for the next iteration after finding a new longest entry.

I have been through both, K&R and Shaw. K&R is the resource I point new developers to who want to learn C.

This book introduced me to modern languages, from my previous experience coding in assembler. It inspired the millions of lines of code I've written since.

I now understand why my subsequent programs, and those of many in my generation, have been riddled with bugs for 3 decades.

K&R is hardly the only book written about C, nor is it the best. You should definitely not stop learning about a language after you've read just one book about it.

K&R C was best practice in 1980 or so, since then we've learned a lot about C, about what to do and what not to do. If you still program C like it is 1980 then you can't really blame that on a book from 1978.

I am midway through K&R, but now I am curious, what other book would you suggest?

Popular suggestions (not introductory):

    C Interfaces and Implementations (David R. Hanson)
    Expert C Programming (Peter van der Linden)
I'm also a fan of C Programming: A Modern Approach (K. N. King) as an introductory book, but it's very different in intended audience from K&R. King is writing for students, and he assumes nearly nothing. K&R are really writing for experienced fellow programmers, I think. So it's an apples to oranges comparison. But King is certainly more modern. He includes a fair amount of C99 material. The writing can't compare with K&R. It's nowhere near as dense or elegant. But if you find parts of K&R a bit too dense and elegant, it's a good trade-off. (I used them both in tandem while trying to learn C this summer. It worked well for me. When King got too verbose, switch to K&R. When K&R lost me, switch to King.)

King's book was my introduction to serious programming when I was younger. In an era of my life where I usually prefer ebooks, I still have King's book, in dead-tree format, and love it.

IMO, this is a better book at explaining C clearly: "advanced programming in the unix environment" by W. Richard Stevens. Don't be fooled by the title.. it's a beginner book.

> Don't be fooled by the title.. it's a beginner book.

I never know what to do with comments like this. It's obviously not a beginner's book. At the very least, it assumes that you know C. If your response is "C is a small language. You can learn it by reading <some-other-thing>.", you're admitting that APUE is not a beginner's book. If your response is, "C is a small language. You can learn it by reading APUE itself.", then I call bullshit. Is it theoretically possible, sure. But is it a natural way for a true beginner to learn C? Certainly not.

> It's obviously not a beginner's book.

What do you mean by this? You obviously have NOT read it. I have. I'm telling you it is a beginner book. It does not assume you know C or anything else. It has nothing to do with the size of C. Grabbed it off my bookshelf -- he starts off telling you how to login to a unix system. How advanced can it be?

This is literally the book that taught me the basics of C within a week -- the first language I learned (15 years ago). So it is definitely a beginner book IMO.

Edit: I will say, he does get into some more advanced things toward the end. Which is why I like the book.. he starts off very basic, and by the end, he's covered the basics of interprocess communication, shared memory, etc.

I have read the book. It does not teach you C. It assumes that you know the language. It does not teach you anything about the syntax of the language.

He does not tell you how to log into a UNIX system. He tells you what happens when you log into a UNIX system. (That is a big difference. Compare the first chapter of Kerrigan and Pike's The Unix Programming Environment, which actually does teach about terminals and explicitly talks about how to log in and what a login is -- for a complete beginner.)

Here's the first bit of code, from page 5:

    #include "apue.h"
    #include <dirent.h>
    main(int argc, char *argv[])
    	DIR		*dp;
    	struct dirent	*dirp;
    	if (argc != 2)
    		err_quit("usage: ls directory_name");
    	if ((dp = opendir(argv[1])) == NULL)
    		err_sys("can't open %s", argv[1]);
    	while ((dirp = readdir(dp)) != NULL)
    		printf("%s\n", dirp->d_name);
My point is very simple: If you do not know C and especially if you also do not know any programming language, it would require either magic or another book or tutorial to figure out the syntax or semantics of those lines. Why is the first include thingy in quotes and the second in brackets? What is an include thingy anyhow? What is DIR? Why do some things have a * in front of them? When are semicolons used? Not every line ends in one, but most do - why? What is the -> used for? When do I use parentheses and when do I use braces? What is argc or argv? Are those names important? What is NULL? Why are some things all capitals and some not. Does that matter? He doesn't answer any of these questions (nor should he), because the book is not meant to teach you C. It assumes that you already know C.

It is not a beginner's book.

Here's a bit of code from page 10:

  #include "ourhdr.h"

    printf("hello world from process ID %d\n", getpid());
Ch 7 is the environment of a unix process. 7.2: "argc is the number of command line arguments and argv is an array of pointers to the arguments. We describe these in Section 7.4." exit codes are discussed in section 7.3, where he also talks about 0 in exit.

Granted, he does not discuss the syntax or NULL...

Looking through it.. I'll agree it isn't a beginner book. Intermediate at worst though.

Hmm... Having not seen the first edition of the book (which is the edition you must have given that you used it 15 years ago and it was updated 8 years ago), it's possible that your assessment of its elementary content is specific to that edition.

However, the second edition of the text is by no means elementary; unless you are conflating beginner literature with comprehensive literature, and the text is certainly comprehensive.

I'll grant you that Chapter 1 is a summary and introduction to the entire book with little challenging or novel material for someone experienced with UNIX and C, but many advanced texts have introductory sections. A large portion of the "The Art of Computer Programming: Fundamental Algorithms" is essentially a review of the elementary mathematics necessary for the analysis of algorithms, but I would never call TAOP a "beginners book"!

This thread has been interesting for me. The first time I've opened the book in 13-14 years probably. Not exactly what I remembered. So just ignore my posts :)

Harbison & Steele, "C: A Reference Manual".

You could do a lot worse than getting a copy of Sedgewick 'algorithms in C'.

Don't blame K&R because it took you 30 years to notice that one of the many widely available string libraries might be a good idea.

It was a poorly worded joke. US, irony etc.

That book was a first step for many of the devs that build many of the technologies that support everything we do online today.

Ironically, I most likely wouldn't hire Kernighan or Ritchie because of their poor programming style.

The one has passed on, the other is currently otherwise engaged. You couldn't afford them, and you're so arrogant that anybody halfway versed in the art will probably not want to be associated with you to begin with.

You'd be HONOURED to work with people of that calibre, but instead you feel you need to show you're better than they are by kicking at them. Trust me, you can't kick that high up.

They (along with Ken Thompson) created C and Unix, the most successful programming language and operating system in history, both still in heavy use over 40 years later.

Long-term success might be a better metric than "style". Just sayin'.

Dennis Ritchie created C. Also, you might have a hard time getting him into your office for an interview; he died just over a year ago.

Your hubris takes my breath away, which makes it difficult to laugh at your ignorance. Painful.

I am sorry, there was no 'hubris' at all. You obviously misunderstood my comment along with other few not very bright people. Recently I had to conduct a number of job interviews on C/C++, and then I came across this article, and a thought came to my mind that great people cannot be measured with a scale of mediocrity, and that the mainstream in every field of human knowledge is used to be very focused on non-essential detail. Sapienti sat, but in your case it is the reversal.

Do you have this a lot, that other, not very bright people misunderstand you?

Looking at your responses, I see you are a guy with very straightforward stereotypic thinking, you are aggressive, and your little taunt is also childishly primitive and straightforward. Does it make sense to talk to you at all? So the answer is 'no', of course, I do not 'have this a lot', as I usually simply ignore this kind of people, but unfortunately there is an option to downvote a comment for the benefit of the stupid.

I understand where you're coming from now. I think your delivery needed work -- it relied too much on your own mental context.

Since the OP was critical of K&R, it made sense to interpret your remark in that context rather than the one you mention.

They probably wouldn't be interested in your project, so it is a win-win.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact