
Deconstructing "K&R C" - gnufs
http://c.learncodethehardway.org/book/learn-c-the-hard-waych55.html
======
tba
I'm confused. What is the "defect" in K&R's "copy(char to[], char from[])"
function?

The author notes that "the second this function is called...without a trailing
'\0' character, then you'll hit difficult to debug errors", but no function
with that signature could possibly work in this case.

The built-in "strcpy" function has the exact same limitation. Does the author
have a problem with it as well? Null-termination is a fundamental concept of C
strings; there's no reason to shield C students from it.

The other example of "bugs and bad style" in this "destruction" of K&R C is a
minor complaint about not using an optional set of braces.

I hope the remainder of the [incomplete] chapter demonstrates some actual bugs
in the book's code, because it currently doesn't live up to the first
paragraph's bluster.

~~~
mechanical_fish
_The built-in "strcpy" function has the exact same limitation. Does the author
have a problem with it as well?_

Yes. From the linked chapter:

 _we avoided classic style C strings in this book_

From an earlier chapter on strings:

 _The source of almost all bugs in C come from forgetting to have enough
space, or forgetting to put a '\0' at the end of a string. In fact it's so
common and hard to get right that the majority of good C code just doesn't use
C style strings. In later exercises we'll actually learn how to avoid C
strings completely._

This is the author's opinion, of course – it's from a book, that should go
without saying – but it's not as if the idea of avoiding C strings in general,
and "strcpy" in particular, is an oddball or unique point of view. See e.g.:

<http://stackoverflow.com/questions/610238/c-strcpy-evil>

~~~
peapicker
"the majority of good C code just doesn't use C style strings..."

Nice. In the 23 years I have worked on C language products, I've never worked
on "good C code" by this definition.

The cool thing about this guys book, I guess, is that by avoiding all the
things about the language he doesn't like, any reader will be wholly
unprepared for C in the Real World after this book.

~~~
mechanical_fish
This chapter is _explicitly teaching people_ what the author's idea of bad C
code is: It has them read some, then tells them specifically what the gotchas
are, then asks them to code up test cases for the flaws and run them through
Valgrind.

Where's the "avoiding" here?

And which of the skills being exercised - imagining what kinds of bad things
could happen, writing executable test cases, detecting segfaults - are not
useful in the real world?

~~~
zedshaw
That's basically the point of this chapter. It's getting people to think like
a hacker and try to break the code in unintended ways. That makes them better
programmers and helps when avoiding common mistakes in C code.

Using K&R to do this is to give people a set of known good C code samples and
show how even those can be broken and misused.

~~~
ori_b
Noble goal, but the way the chapter is written doesn't really come across that
way, in my opinion.

~~~
aptwebapps
Speaking as someone who doesn't know C, hasn't read K&R, but who (usually!)
has decent reading comprehension, it _does_ come across this way.

He's completely explicit about what he's doing and why. I don't understand how
people can read that and still feel that he's being unfair.

Maybe I'm just too literal-minded.

~~~
ori_b
Calling string manipulation functions buggy and broken because they fail when
you don't pass them a string is both wrong and silly.

~~~
bonzoesc
How do you test that something in C is a string?

~~~
tedunangst
You don't, because you can't. That's basically the same as the beginner
question, "How can you test if a pointer has been freed?"

~~~
bonzoesc
It was a rhetorical question. What's the sense in having functions that
operate on "strings" if you can't figure out what a "string" is at runtime?
It's much saner to have functions operate on "strings that are 80 characters
or less" or "a structure containing a integer `length` and an array of
`length` chars."

~~~
ori_b
What's the sense in having functions that operate on "pointers to valid
memory" if you can't figure out if it's "valid memory" at runtime?

The point is that the function is not buggy. You may not like the
specification for it. You may think it should be designed differently. This is
not the same as the code being buggy.

------
blix
I don't see what the big deal is here. He throws inputs at a function not
meant to handle them and gets a segfault. Isn't this more-or-less expected
from all C functions?

On careful examination of his example code, you can see that there's no
attempt to terminate the buffer (though he misidentifies it as an "off by one"
error). Throwing arbitrary buffers into functions meant for null-terminate
strings will cause errors everywhere, not just in K&R example code. In the
next chapter, does he use strcpy in the same example and use that error to say
that we should deprecate the entire standard library?

On a more nit-picky note, the 'triple-equality trick' the author derides is a
common C idiom and one every C programmer should be familiar with. Perhaps it
doesn't belong in the first chapter, but it definitely belongs in any C
manual.

~~~
jpablo
Agreed, the example used seems very strange. Every function makes functions
about it's parameters and in C null terminated string are to be assumed unless
otherwise noted. The std library strcpy would just behave in an identical way.

Yeah, I'm all about checking your assumptions and defensive programming but
k&r is hardly as fault because you throw garbage at a random function and it
segfaults.

I find the idea of teaching modern C by pointing problems k&r awesome, but the
examples given are not really good.

~~~
angersock
_Yeah, I'm all about checking your assumptions and defensive programming but k
&r is hardly as fault because you throw garbage at a random function and it
segfaults._

Maybe he should write a series on "Learn Ada the Hard Way"?

~~~
beagle3
(apologies for hijacking an unrelated thread, but I have no way of contacting
you ...)

During our discussion of the redis/windows release, you mentioned your team is
about to release some library code around new-years day. Has that release
happened? If so, where can I find it?

Thanks in advnace

\--beagle3

~~~
angersock
Yeah, sure!

Code's up here: www.762studios.com

It's been up about a week or so. (EDIT: we had it up on the first,
technically.)

We're pushing out a significant update to that code in the next few days, as
well as polishing the site with the bugtracker and things.

One of the big problems we ran into was function-level specialization for
certain container classes--storing smart pointers in certain cases would screw
up the reference counting. The code update will be fixing that.

If you want to get in touch, shoot me an email at cre1@762studios.com

Happy hacking! :)

~~~
beagle3
Thanks!

Looks like a solid, high quality code base from a quick look. Keep up the good
work!

------
adavies42
> Braces Are Free, Use Them

I disagree. Braces have a cognitive load, particularly if they're given whole
lines to themselves. Which is easier to read,

    
    
      while ((len = getline(line, MAXLINE)) > 0)
          if (len > max) {
              max = len;
              copy(longest, line);
          }
      if (max > 0) /* there was a line */
          printf("%s", longest);
    

or

    
    
      while ((len = getline(line, MAXLINE)) > 0)
      {
          if (len > max)
          {
              max = len;
              copy(longest, line);
          }
      }
      if (max > 0) /* there was a line */
      {
          printf("%s", longest);
      }
    

? Personally, I'd probably write it

    
    
      while ((len = getline(line, MAXLINE)) > 0)
          if (len > max) {
              max = len;
              copy(longest, line);
          }
      if (max > 0) printf("%s", longest);
    

. Consistent structural indenting makes it easy to see the boundaries of the
control structures.

~~~
SeanLuke
I'm probably the last person in the world to prefer:

    
    
          while ((len = getline(line, MAXLINE)) > 0)
              if (len > max)
                  {
                  max = len;
                  copy(longest, line);
                  }
        
          if (max > 0) /* there was a line */
              printf("%s", longest);
              
    

I fear this may be literally true: this brace style is called Whitesmith's,
and I was reporting bugs to the cc-mode indenter for emacs a while ago.
BSD/Allman and K&R styles never made sense to me.

~~~
zedshaw
Add a new statement to the while-loop. Now you've figured out why that style
isn't so hot. You now have to remember to go add the new braces, and typically
people are mentally putting them in so they forget. Same with the last if-
statement. You'll go, add the new line, indent and go "cool done". Then
scratch your head for hours wondering why it's not working.

~~~
SeanLuke
Oh, well there are two issues here: whether to always include braces, and what
brace style to use. Whitesmiths is a brace style, that's what I was pointing
out. The decision to include braces or not is a different thing. I tend to
omit braces in obvious situations, but am not beholden to it.

------
askmeaboutloom
I don't see his point, really. Yes, C doesn't have a statically testable
string type. And yes, the convention is that a C "string" is just an array of
character data with a trailing NUL-Byte. He constructs an array without a
trailing NUL-Byte - so that's not a string, but an array of characters.

The fact that copy() now happily runs through memory is the expected result of
the bug in the calling code. No, it's not useful. Yes, there are problems with
the whole approach of using a terminating value - but this doesn't seem to be
his point (otherwise he would also have mentioned the linear runtime
complexity of strlen() and the problems that arise when a string itself
contains a NUL-byte, I suppose).

Now, what is his point? This is chapter 55 [!] in a book called "Learn C The
Hard Way" and the author complains about well-know problems with the standard-
lib string convention, optional braces, and the common C idiom of doing
assignment and value-testing at the same time, _and_ calls all of this
'Deconstructing "K&R C"'?

Maybe a "What I personally don't like about C" would have been a better title.
The K&R examples are flawless. The language and stdlib are not. That's well-
known. What is new?

~~~
zedshaw
> I don't see his point, really.

The point is three fold:

1\. K&R has defects in it when the functions in it are used out of context
because they didn't include defensive programming practices considered
standard today.

2\. People can learn a lot about writing good code by critiquing other code,
even from masters, so I'm taking them through doing that.

3\. There should be no sacred cows, and people hold K&R on a pedestal without
questioning what's in it. This is really what causes #1, so I have them do #2.

That's it really.

~~~
askmeaboutloom
Noble goals. And now that you said that I don't understand how I could have
overlooked that in the first place.

Maybe it would be a good idea to put a bit more stress on the "no" in "no
sacred cows", that's an important point beyond just K&R. Nothing should be
sacred, including "Learning C The Hard Way" - and a "question everything and
everyone" mindset is generally a good thing to have as a programmer [I think I
read that in "The Pragmatic Programmer" ;-)]. But other than that I now think
that chapter is fine. I do apologize for the somewhat passive-aggressive form
of my question.

Thanks a lot!

------
mhartl
It's worth noting that this isn't a standalone article; it's lesson 55 in
"Learn C the Hard Way". In other words, this isn't meant to be a thorough
critique. It's just one lesson of many.

------
cbs
Guys, no need to jump all over this, clean code and defensive programming are
solid practices to advocate. "That's just the way we do C" isn't really a good
defense to someone trying to point out the warts in the way we do C, even if
it did come from K&R.

Its easy to point to this for not sufficiently proving its argument, but its a
work in progress, you're not the target audience, and finally, some of the
shit in K&R seriously would not pass a code review here, and probably wouldn't
where you work either.

------
MattyDub
I think people are misunderstanding this; it's pedagogical. From the link:

'I want to use it as an exercise for you in finding hacks, attacks, defects,
and bugs by going through "K&R C" to break all the code...When you are done
doing this, you will have a finely honed eye for defect.'

He's trying to train his readers to think about places code can break.

------
JeremyBanks
This is just the first section of the chapter, right? It seems a little
premature to be distributing or discussing it here.

~~~
calibraxis
Good point. But that's the most inspiring approach I can remember ever seeing
in such a book. If only all of my technical books were written with such a
mindset.

~~~
zedshaw
Thanks, I do like breaking things.

------
tylerc230
"Before the switch to heap memory, this program probably ran just fine because
the stack allocated memory will probably have a '\0' character at the end on
accident."

Stack memory is not preinitialized to '0' either so using stack memory would
show this 'defect' as well (as others have pointed out this is a defect in the
calling code, not in copy).

~~~
apu
I think that depends on compiler, although it's been a while since I've done
C, so I might be wrong.

~~~
zedshaw
It's fairly vague in the standard, but most of the time this is what happens
so nobody hits the bug. Also, many OS are notorious for being loose with the
stack and strict on the heap for various reasons.

------
Roboprog
To be honest, I was never that impressed with the K&R book as an introduction.
Coming from a higher level language (Pascal) in college after several years
back down to C was painful. K&R (borrowed from a classmate) presumed too much
obviousness in what assembly language constructs were being abbreviated by a
given piece of C code, and what the other pieces of C code around the example
to be explained did.

Later in the semester I picked up a copy of the first edition of this book:

[http://books.google.com/books/about/C_an_advanced_introducti...](http://books.google.com/books/about/C_an_advanced_introduction.html?id=QtjsnjT2BjwC)

This book did a much better job of explaining what the hell was going on,
particularly in regards to managing memory / strings / arrays / (C) pointers
and understanding their placement.

Sure, Pascal had pointers, when you asked for them :-)

Yeah, throw the K&R book at newbs you hate.

------
jemfinch
Isn't "assert(line != NULL && longest != NULL && "memory error");" a bug in
Zed's example code?

~~~
tba
No? It ensures that malloc didn't return a NULL pointer and the '&& "memory
error"' is a common pattern to add a comment describing why an assert()
statement failed.

~~~
kisielk
One problem with it is that on most compilers if you compile with
optimizations it will remove the assertion. Now the code is no longer guarded
against malloc failures and will just segfault. If the intent is to teach
people how to handle malloc failures gracefully, it's not that great of an
example.

~~~
CJefferson
Why do you think the compiler with optimizations will remove the assertion?

The compiler will realise that:

    
    
        assert(assert(line != NULL && longest != NULL && "memory error");
    

Is equivalent to:

    
    
        assert(line != NULL && longest != NULL);
    

But there doesn't seem anything wrong with that.

~~~
tptacek
Production code is supposed to define away the assertions. It is an actual
error to manage important program state with asserts.

(That doesn't make these particular asserts an error).

------
richardk
Myself, I learned from K&R, and I thought it was great at the time, now I'm
less sure of this.

I've heard that "A Book on C" is supposed to be a much better book for
learning C

------
perfunctory
Not only the book "needs to be taken down from its pedestal" but it's time for
the C language itself to go. It was fine language at the time (better than
assembly I guess). No more. Not in the 21st century. The most common excuse
for sticking with C is efficiency - I pity those who still believe so. Another
excuse is extended set of libraries - <http://go-lang.cat-v.org/pure-go-libs>,
when Go was publicly released?

That is, stop writing books about C and start creating new languages.

~~~
perfunctory
A lot of C lovers I see.

------
16s
The greatness of the book is its terseness. Some of the code examples may be
dated and buggy, but that does not diminish it at all.

~~~
bonzoesc
Dated, buggy, and un-stylish examples are exactly what diminishes it.

~~~
tptacek
And those dated, buggy, and unstylish examples would be?

~~~
bonzoesc
I dug out my 29th printing copy for this :)

Dated: the "Hello world" on page 6 produces warnings when compiled "> gcc
-ansi -o hello hello.c -Wall" c.f. <http://c-faq.com/ansi/maindecl.html>

Buggy but only to a pedant with infinite time: the word count on page 18 that
uses a `double` which will run out of accuracy once you count more words than
there are atoms in the universe. ( `double toot = 1000000000000000000;
toot++;` what is the approximate value of toot?)

Buggy by omission: Section 5.5 (on page 104) doesn't really go into the
hazards of C strings, and the text's implementation of strcpy when using
strings from unfriendly places (public wifi, quicktime files) can turn the
interesting kind of nasty if you hammer on it enough.

Unstylish: brace-less if statements in many examples; a two-line one on page
110, for example.

I worked as a contractor in a C shop for several months a few years back, and
I kind of see what happens when you code K&R-style instead of extremely
defensively and conservatively. For example, the strcpy on page 106 is cute,
but is it going to be as maintainable as the naïve version the previous page,
or the hypothetical (unless I just failed to find it) LCTHW known-length
string copy function?

~~~
tptacek
1\. You're singling out the first "hello world" program in the book which is
making an effort to show C using the fewest concepts necessary; flip elsewhere
in the book and notice the other main() functions return int values. (The
default "int" is C89; the "warning" is for C99; the second edition of K&R is
ANSI C89, not C99).

2\. First the nit misses the spirit of the example which is simply to
demonstrate that "double" has higher precision than the fixnum "long" type;
second, calling it a bug misses the fact that text explicitly does _not_ say
you can use it to count arbitrary strings; as the text says, it depends.

3\. You're calling "buggy by omission" an implementation of strcpy not
substantially different from BSD's libc. K&R's is, in fact, how you would
write strcpy(). That strcpy() doesn't handle hostile strings is irrelevant;
you're not intended use strcpy() on hostile strings.

4\. Hard to fathom how one criticizes a book on C for braceless "if"s. We
could go toe to toe on high-quality C programs and whether they ever use them;
nginx, for example, avoids them; dovecot doesn't. Both are exceptionally high
quality C codebases.

Not that you have anything to prove to me, but I reject all your examples, and
think you were too cavalier about judging K&R.

But I appreciate the response (and thus the opportunity to piously respond to
it). :)

~~~
bonzoesc
> Not that you have anything to prove to me, but I reject all your examples,
> and think you were too cavalier about judging K&R.

I don't think it's necessarily cavalier. If I was asked to review code that
aligned with the K&R examples today, I'd kick it back.

I'd much prefer a codebase that compiles with -Wall -Werror today than one
that doesn't, one that doesn't use floats for integers, one that doesn't use
strcpy even from a literal string to the heap, and one that doesn't put a
braceless `if` inside a braceless `for`, and I suspect you would too.

It's a good book to learn 1988 C, and if you're just going to learn it for
your classes and go off into the exciting world of Java and C# it's probably
fine, but it's not the right book to teach someone to write production-quality
C.

~~~
tptacek
If you reject code that, for instance, strcpy's string literals (because they
could have used strncpy to be extra safe), you're rejecting most professional
code today.

I _wish_ it was a best practice that strcpy() was never used, because it would
make static source code analysis a lot easier: see strcpy()? Flag it! But no:
lots of excellent C code properly relies on the assumption that string
literals don't change their size at runtime.

Similarly, yours is a stylistic standard for braces that _rejects OpenBSD
KNF_. Good luck with that. You're entitled to an opinion and, on your own dev
teams, it's perfectly reasonable to demand consistency with an "always use
braces" style. But it's not reasonable to call style fouls on other people's
code that adheres to style(9).

Really strong disagree that K&R isn't a good first book for writing production
C. I could go on & on, but since I'm echoing the commanding majority of all C
programmers in sharing that sentiment, there's probably no need.

------
halayli
The function copy expects a C style string but he pass a buffer to some data.
If you pass it the proper argument you'll get a proper result, it's that
simple.

I see the complaint is more about C style strings in general because it's easy
to corrupt them by overwriting NULL at the end rather than a complaint about
"K&R C" book. I don't get why he decided to put it under this title.

~~~
zedshaw
The problem is in the word "expects". This function expects certain things but
doesn't assert that they're happening. Because of this you get bugs when you
try to use that function in other code, which is what beginners do.

And yes, the solution is to require a size with every string buffer, which is
what I do nearly all the time now. But, the best way to learn _why_ that's
solution is to try to hack something with this kind of defect.

~~~
halayli
I think the core issue here is how C style strings are implemented.
Personally, I much prefer if C strings were <size><data>, but what I was going
after is that it's not K&R specific, it's a C design "mistake".

------
gonzo
I'm sure I'll be downvoted, but C is not Java.

From what I can tell, the "it's time for C to go" comes from a generation
raised on Java.

C still has its place.

------
squiggly101
Kind-of a long-winded way to point out the (very well known) fact that passing
non-terminated strings to functions that expect terminated strings is fraught
with danger, isn't it?

~~~
zedshaw
Nope, and what I'm actually doing is fuzzing the input to this function, which
is a common breaking technique.

You see, programmers tend to go around only using functions exactly as they
were intended. Hackers come along and then use them in unintended ways, which
causes bugs. The way to prevent this is to think like a hacker and try to use
your code in unintended ways and then try to prevent them.

~~~
gonzo
You say 'hackers' like it's a bad thing. As a reminder, this is "Hacker News".

