
Deconstructing K&R C (2010) - ColinWright
http://c.learncodethehardway.org/book/krcritique.html
======
arielby
`safercopy` also does not always terminate, say if one of the passed pointers
point to invalid memory, which then segfaults when accessed. Similarly, you
can have problems if the destination buffer overlaps the saved return address
of the stack of another thread.

C's type-system is unsound, and even if it was sound, it is too weak to
express interesting invariants, certainly it is far too weak to express
invariants required for termination. In fact, no popular language can express
the invariant "this doubly-linked list doesn't contain a cycle", which is
essentially required for termination when working with these.

Even ignoring termination, in essentially all languages, including C, there
are properties like "this object isn't going to be accessed by another thread"
which can't be expressed in the type system, or even checked at runtime, but
are required for any interesting kind of correctness.

~~~
Certhas
Doesn't Rusts linear type system ensure the latter?

~~~
arielb1
Functional languages, along with Rust's linear type-system, can ensure this
for single-ownership structures (trees, singly-linked lists), but not for
structures with more complex ownership, like doubly-linked lists. The
unpopular languages I was talking about were Agda and friends.

~~~
pcwalton
To be clear, though, Rust's type system can ensure thread safety of doubly-
linked lists if you use the doubly-linked list structure in the standard
library (or write one yourself using RC/etc.)

------
zedshaw
Well this if fun folks, but frankly, you all have proven just how conservative
programmers are. My chapter advocating something as "revolutionary" as
including the lengths of strings when you process them has received more hate
mail than anything I've written. And I'm the guy who gets death threats
because I don't like Ruby.

If you are reading this and saying, "This guy's wrong about including the
lengths of strings!" Then I ask: Why are you using _any_ modern language?
Nearly every language in use today include the lengths of strings and backing
buffers in their string implementation, and you all use it and appreciate it,
but when I advocate it suddenly it is heresy and deserving of vitriol.

I personally don't care what you all think, which I know insults your pride at
being the smartest people in the room, but until you are willing to admit that
your hatred of that chapter is based entirely on nostalgia and not on the
merits of my fairly simple claim that K&R (or any) C code is error prone, then
you are not going to advance the state of the art any further than the 1970s.

This is what makes me sad about computing today. You are all desperately
clinging to the notion that you are radical free thinking futurists while you
desperately cling to reverence for the past and are incredibly resistant to
any change. If something as simple as a thought experiment to look objectively
at how things are done and try something better makes you angry, then you are
not free thinking futurists.

Anyway, enjoy your day folks. I'm going to go do some painting.

~~~
johnchristopher
Hi (sorry for hijacking the thread),

Is there a plan for a french version of learning the hard way in the future ?
That would be awesome.

~~~
zedshaw
That'd be up to my publisher (A/W) but I do believe they roll out French
versions of my book depending on market demand. How that demand happens I have
no idea.

~~~
johnchristopher
Thank you.

------
deng
This needs (2010) in its title and was discussed here three times already:

[https://news.ycombinator.com/item?id=5012432](https://news.ycombinator.com/item?id=5012432)
[https://news.ycombinator.com/item?id=4095294](https://news.ycombinator.com/item?id=4095294)
[https://news.ycombinator.com/item?id=3448573](https://news.ycombinator.com/item?id=3448573)

~~~
ColinWright
Thanks for that - I've edited the title.

I hadn't seen the previous discussions, but the last one was two years ago, so
maybe it's worth having this here so people can add any new thoughts or
comments.

------
Osmoticat

        Many people have looked at this copy function and
        thought that it is not defective. They claim that, as
        long as it's used correctly, it is correct. One person
        even went so far as to say, "It's not defective, it's
        just unsafe." Odd, since I'm sure this person wouldn't
        get into a car if the manufacturer said, "Our car is
        not defective, it's just unsafe."
    

Terrible analogy.

A rapidly spinning piece of steel is unsafe, flammable liquids are unsafe,
sparking hot metal is unsafe, cars contain all of them.

A copy function is obviously a component, and not a whole car.

~~~
zedshaw
Two links, 1, a small tiny component can be the equivalent of a spinning piece
of unsafe steel:

[http://heartbleed.com/](http://heartbleed.com/)

And a small component is all it takes to make a vehicle unsafe:

[http://www.gmignitionupdate.com/](http://www.gmignitionupdate.com/)

------
bithush
For anyone who has picked up a copy of K&R wanting to learn C and hit a wall I
highly recommend C Programming: A Modern Approach, 2nd Edition by K N King.

Having read many books on C over the years I think King's book is without
doubt the best intro book to C. K&R is great but it is a reference more than a
tutorial in my opinion and is better to read once you have the basics of C
(and programming) under your belt.

~~~
MrBuddyCasino
Costs 100€+ on Amazon, anybody knows if this available as an ebook?

~~~
ANTSANTS
_cough_ library genesis _cough_

------
kwhitefoot
I think everyone is failing to see the wood for the trees. The great thing
about "The C Programming Language" is that it is short and to the point. It's
a great way to learn about about many aspects of programming. I don't think
anyone recommending it would ever expect it to be perfect; I recommend it to
people in the hope that they will think about the business of coding and I
suspect that Kernighan and Ritchie hoped the same. I agree that it is a shame
that the code examples have errors, perhaps those who have the time and
expertise should contact the publishers to point them out so that the 43rd (or
whatever it is now) printing stands a chance of being correct.

It's a great pity that this discussion has descended into an exchange of
insults.

------
brghts
Nit: I would suggest not using assert() for checking error return values.
Compiling the program with -DNDEBUG disables the check. assert() is for
finding bugs in your program. A malloc() call returning NULL is not a bug.

~~~
zedshaw
I know, that's why I have these:

[http://c.learncodethehardway.org/book/ex20.html](http://c.learncodethehardway.org/book/ex20.html)

But, when talking about this I didn't want to muddy the waters with my own
assert alternatives. Programmers have a hard enough time focusing on the issue
of a for-loop vs. a while loop.

Edit: malloc returning NULL means you're out of heap. That's usually
catastrophic in almost all cases, but the important part is that people don't
_detect_ that, then use the NULL pointer. That's the bug.

~~~
mjschultz
In a chapter about deconstructing someone else's book, your own book also does
dangerous things according to yourself:

> The problem is, as with _every_ book with code ever in the universe,
> beginners will copy that code out and use it somewhere else and then the
> function is wrong.

Yet, if a beginner copied some of the code in this chapter they'd have the
exact bug you are talking about here (using the NULL pointer returned from
malloc()).

I think you should probably expand that into the safer checks (don't forget to
free(line) if longest is NULL too!).

~~~
zedshaw
Are we looking at the same code? I use an assert to quickly check for NULL,
and this is a simple example of how to work with the function. Other parts of
the book use more extensive error checking and I use my debug.h macros quite a
lot, but if you find bugs feel free to email me them. I'd really appreciate
it.

~~~
mjschultz
I'm referring to this code on the originally linked page [1] (you'll have to
scroll back a bit because your header blocks the content).

In the context of this thread, brghts states that this is dangerous because if
you compile with -DNDEBUG the assert is optimized away.

So if I copy that code with the assert statement, it will be optimized away
and your code no longer performs the NULL check. This is bad.

As you mention, beginners tend to copy code off the Internet and cause bugs.
If you recognize this and claim to be teaching people you should not use bad
practices in your example code. Period.

If you don't want to muddy the waters with your custom debug macros, then you
should still play it safe when checking return values the a beginner may
simply copy and think is correct.

[1]:
[http://c.learncodethehardway.org/book/krcritique.html#code--...](http://c.learncodethehardway.org/book/krcritique.html#code
--krc--1.9-1.c-pyg.html-24)

------
theseoafs
> That means the for-loop will never loop forever, and as long as it handles
> all the possible differing lengths of A and B, never overflow either side.
> The only way to break safercopy() is to lie about the lengths of the
> strings, but even then it will still always terminate.

So what? The function is still crazy dangerous. Buffer overflows are still
possible, and therefore buffer overflow attacks are still possible. It's still
possible (and very easy) to trigger undefined behavior, and it's meaningless
to make claims about whether a function terminates if it triggers UB, and
there are as a result still security concerns.

This isn't a slam on Zed Shaw's book, as much as it is a slam on C. You can
adopt whatever heuristics you like, but at the end of the day you're always
working in C, where it is possible for your program to self - destruct (or
worse!) unless your function inputs are _exactly right_.

~~~
zedshaw
Curious, did you find a bug I'm not seeing? How do you think an alternative
copy function that uses lengths would have buffer overflows?

~~~
theseoafs
You said it yourself:

> The worst possible scenario for the safercopy() function is that you are
> given an erroneous length for one of the strings and that string does not
> have a '\0' properly, so the function buffer overflows.

Your argument is that the safercopy() function is "safer" in that it is
guaranteed to terminate regardless of whether the underlying buffer has a NUL
byte in it. While that's true, it's sort of missing the point a bit, I think.
The unsafe copy() function wasn't primarily unsafe because there was no
guarantee it would terminate -- it was unsafe because it corrupts a bunch of
memory, exposing you to a wide range of insecurities (the least of which is
that your program might crash). safercopy() is still prone to that behavior if
the lengths you pass in aren't accurate. While it is guaranteed to only
corrupt _n_ bytes of memory rather than an arbitrarily large number of bytes,
the damage might as well already be done by the time you corrupt those _n_
bytes. So to answer your question:

> How do you think an alternative copy function that uses lengths would have
> buffer overflows?

It's unsafe in exactly the same way that the unsafe copy() function is: if the
arguments you pass into the function are incorrect/don't point to the data you
think they point to, you'll corrupt memory. Now, you could make the argument
that it's much easier to just remember the lengths of all the buffers you
allocate than it is to remember to NUL-terminate all your C strings -- I would
agree -- but I don't know if the article does a great job of explaining that.

~~~
zedshaw
> It's unsafe in exactly the same way that the unsafe copy() function is

No, that's the logic error every programmer makes. The copy() function is
_always_ wrong, because it can't confirm that the string has the right length
without looking at the string which causes the error.

With my function I can go to as great a length as I want to confirm that the
string is actually as long as I say it is. I can't mitigate every possible
error of misuse, but the errors safercopy() can have are much smaller than
copy().

Your argument is effectively stating that because you can exploit one with a
general "UB" error, that it's the same size and classification of errors as
with the other. That's invalid, and proven in my writing.

------
amelius
I find it, above all, baffling that since k&r, we still write language specs
in prose, rather than in a formal language...

~~~
6581
[https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form)

~~~
marktangotango
BNF can specify the syntax, but not the semantics ie the meaning. There are
several methods of specifying the semantics[1], it's known to be notoriously
difficult to completely specify the semantics of a language, with some
languages more suited to a formal definition than others. I believe Haskell is
the most commonly used language today with the best formal definition.

[1]
[http://en.wikipedia.org/wiki/Semantics_(computer_science)#Ap...](http://en.wikipedia.org/wiki/Semantics_\(computer_science\)#Approaches)

------
slkpg
The code in K&R looks correct to me as written. The copy function is only
called on char arrays obtained from getline which properly null terminates the
array. As pointed out, C style "strings" do not work if the null terminator is
not present. If underlying assumptions are not valid in any context, nothing
can be assumed to work. The text about malloc vs stack is both wrong and not
relevant.

~~~
zedshaw
Thank you for proving my point. It is right as long as the tower of babel it's
used in never collapses. The problem is, as with _every_ book with code ever
in the universe, beginners will copy that code out and use it somewhere else
and then the function is wrong.

And, you literally just repeated every single argument I demolish in the
chapter, which means you didn't read it.

------
microcolonel
His "Stylistic Issues" section also leaves a lot to be desired.

The second if statement is clearly not part of the loop structure, it might be
possible to throw somebody off (if they don't know how this stuff works) if
you were to indent the second if statement, but we don't, and your
linter(you're using one, right?) will also complain.

~~~
zedshaw
I'm sure the people who wrote gotofail
[https://www.imperialviolet.org/2014/02/22/applebug.html](https://www.imperialviolet.org/2014/02/22/applebug.html)
would disagree with you on how "clearly" you can see those kind of braceless
structures.

------
stefantalpalaru
Relevant Lobsters thread:
[https://lobste.rs/s/kwrape/deconstructing_k_rc](https://lobste.rs/s/kwrape/deconstructing_k_rc)

------
mikeash
What a terrible article. I feel dumber having read it. It's one of those
articles that sounds good and takes more effort to correct than it took to
write, and thus causes a net loss.

Just one example: "That gives every path to this function a 50% to 75% chance
it will fail with just the inputs above." I mean, what kind of nonsense is
this? This is pretty much saying that the odds of winning the lottery are 50%,
because either you win or you don't.

Most of the article is spent on discussing the perils of a strcpy
reimplementation. Yet there's no discussion of a fix. The author simply
assumes the existence of a magical "safercopy" function that behaves correctly
on all inputs. Of course the author doesn't provide the implementation of this
function, because if he tried to write it, he'd discover that it's impossible.

In short: flagged for being awful.

------
informatimago
This is so dumb.

    
    
       safercopy(42,a,33,b);
    

Just use lisp!

~~~
SixSigma
How do I link LISP into my C project to do this safer copy you recommend ?

~~~
jwdunne
First, you must build a half-baked version of Common Lisp in C.

