Hacker News new | past | comments | ask | show | jobs | submit login
Massacring C Pointers (wozniak.ca)
426 points by signa11 on June 26, 2018 | hide | past | favorite | 285 comments

A few years ago, I saw a classroom video (I would guess ~8-10 year olds, in the American public school system) as a demonstration of a teacher's technique called "my favourite wrong answer". She would have them solve a problem (a math problem, in this particular case), collect the answers, show the distribution of the answers, and then pick out one answer (possibly with the student's name redacted) as her "favourite wrong answer". And then she would work through why she liked it so much with the students. In the examples, it was cases where the person was on the right track, but then made an incorrect, but justifiable by _some_ standard, step in solving it.

It seemed like a good way to really try to understand where someone's comprehension broke down. It felt like it added some legitimacy to the students who got the answer wrong, if others could look at it and go "oh, I see why you thought that!" instead of just "wow try to get it right next time". I believe it's part of what good teachers (not limited to school educators, mind you) are doing all the time: looking for the student's gaps and trying to correct those, instead of just repeating the lesson that has already failed to stick.

I guess this just makes me think of that teacher, trying to work out what her pupils were misunderstanding, by looking at their answers.

Where I was teaching, we've started doing code review in classroom in basically the same spirit.

The amount of students prevents us from real selection, instead we take exercises which failed tests (and that are not empty) and explain what was wrong, identify common mistakes or bad style and try to make it correct.

We had a very good feedback on this. And we saw an evolution in students' code. It was way better than trying to do collection of bad practice ...

This is some very good advice! I am teaching a C++ class and am struggling with the quality of the code my students are producing. Their programs output the correct answer, yet the style is really lacking. Periodic code reviews of some selected problems would be extremely useful, thank you for the tip!

> I am teaching a C++ class and am struggling with the quality of the code my students are producing. Their programs output the correct answer, yet the style is really lacking.

I remember trying to help fellow students with their C and C++ assignments back in undergrad, and reading some of their code I got the distinct feeling that most of it was “random-walked” code. Like they tried writing things that looked like C, making random, iterative changes until it compiled. Then once it compiled, made random, iterative changes until the output approached what the expected output would be. Sometimes the functionality ended up right, but with bizarre stuff like: unused variables all over the place, variables assigned then reassigned without using the first value, uses of pointer indirection that look like the student kept adding or removing asterisks until it didn’t crash, elaborate class hierarchies that went unused, etc. It’s like the Infinite Monkeys Eventually Producing Shakespeare thing but with programming assignments.

Yes, that's exactly my own experience. Moreover, we are forced to use ROOT [1] to compute histograms and create graphics: its weird use of OOP makes the students think that they have to use classes even to implement simple functions (e.g., printing a list of values).

[1] https://root.cern.ch/

> uses of pointer indirection that look like the student kept adding or removing asterisks until it didn’t crash

Oh c'mon, that's practically a right of passage when learning C or C++ for the first time, especially if you're new to programming as a whole :-)

Now you're making me think about my specification and verification professor from years ago. A lot of the exercises had us specify stuff with some formalism. Something like a finite state machine for a 3 digit combination lock, or pre- and post conditions of algorithms, loops and such.

The tutorials about these exercises were really cool, because we'd spend a short time wrapping up mistakes in the notation. But then he usually classified and grouped the different solutions according to the design decisions made and it would evolve into large discussions about tradeoffs of choices.

Someone might go "Well I figured the FSM would be smaller if I fail on the first wrong number, and it'd be user friendly because it fails early", and someone objects "But that's insecure!" and someone else figures "And also it'd be much easier to write up a control circuit that always expects three digits" and eventually everyone is confused if we're looking at the user input as a stream of tuples of three digits, instead of looking at the last three digits in a stream of digits.

It was a very instructive class.

The issue with really learning computer science outside top well known schools is that you're basically on your own for education (or worse if taught wrong). Open source software somewhat teaches you things but usually by trial by fire.

I went to an unknown state school and I got a damned fine education. Top schools don’t have a monopoly on good teachers or good students, not even remotely close.

Really? I went to a non-top school and putting aside the formal education component, I felt like I also had plenty of co-learning. That is, I remember many study groups where we helped each other learn, including for programming.

I certainly didn't feel like I was on my own.

Similar experience here, my university helped people form study groups near final exams and had a "homework club", where final year and grad students were paid to help first and second year students with their assignments for a couple hours a week, and people were encouraged to meet each other in that club and find "study buddies" (but no-one actually called it that, because ugh)

Covered everything from the programming to the multivariable calculus and data science (much harder to find students confident enough to teach others for those last two though)

> really learning computer science outside top well known schools

Eh... no. It's not 1975 anymore. This knowledge is widely distributed.

Moreover, the on your own for education bit, even if true is a bit dishonest. Being "on your own" with this in 2018 is absolutely incomparable to how it was 10 years ago, much less 20 or more. The amount of materials accessible for free or extremely cheap is mind-blowing to people who had to learn programming from a single (physical) copy of Stroustrup's "The C++ Programming Language".

when I contemplate the available resources for a starting programmer, my mind is always stunned.

That isn't totally wrong, but I think you drew the line in the wrong place. There are numerous less-known schools that are OK, but a typical community college or liberal arts school isn't going to deliver the goods. State universities are sometimes good.

When I started my CS degree the school was transitioning from teaching C++ to teaching Java, and the state of instruction in C++ was almost as bad as these examples. I had a professor who wanted us to use "new Foo()" everywhere in our code (even local / static variables) because it, "gets the students ready for Java." No matching delete, of course, or mention of RAII. We were supposed to "pretend" we had a garbage collector. By that logic one might prepare students for a course in Spanish by speaking English with a "-o" on the end of every word.

On one of the early homework assignments I realized my professor misunderstood how pointers work - he seemed to believe that (re)assigning pointers created chains rather than changing what the pointer points at. I.e. given "int x,y; int_pointer a,b,c; a=b=c=&x;" he seemed to believe that then executing "c=&y;" would also cause "a" and "b" to point at "y".

I spent the first page of my turned-in assignment excoriating his lack of understanding. Then I presented a class template that implemented a special smart pointer which did behave in the unusual way he seemed to think C pointers work, so that I could write the code exactly as he presented in the assignment and make it actually work.

In retrospect I could have been nicer and considered other pedagogical factors beside technical correctness. I think he took it with more grace than I deserved.

Some of my professors also have ridiculous code standards or exam standards. One of them teaches "algorithms and data structures", but he's not allowing us to even use a break or continue in our loops because he thinks it's "bad practice". We end up with a nested hellhole. Another example from the same professor (correct me if I'm wrong on this as I'm not a C++ expert), but he's constantly inheriting from unique or shared pointers in his classes. Like why though? Why do you have to complicate your code like this? Why can't you just have a normal class with unique or shared pointers in it?

Lastly, and this is taught by another professor, is my class for Linux. It includes bash and C++ programming and focuses a bit on the POSIX API. Our exam was last week and we had 3 parts. The first part was a theory exam by the professor that taught the class. It was about bash/POSIX commands and very little Linux specific stuff. He expected us to know all of the options from all of the commands you can think of: cut, split... You had to know all of them and off the top of your head. It's ridiculous. I'm pretty sure I failed that one, but I absolutely nailed the next part which was about bash scripting (where you could use if statements and the like, the first part did not allow it, only redirection and piping). The man is mad (hah man, that's what I needed during that first exam part).

>Another example from the same professor (correct me if I'm wrong on this as I'm not a C++ expert), but he's constantly inheriting from unique or shared pointers in his classes. Like why though?

You are not wrong. This completely breaks my brain. I really would love to hear the professor's explanation for this.

The same thing as people inheriting from standard containers. It does not make sense!

It's like people see inheritance and then forget that containment is a thing.

> he's constantly inheriting from unique or shared pointers in his classes

That happens when you learn OOP from a bad teacher or text book. Far too many books focused entirely on class hierarchies, as if an inheritance diagram was an almost completed program. That leads to a weird kind of brain damage where you think all problem can and must be solved by inheriting from something. It was the STL that taught people that algorithms matter more than taxonomies, but clearly some professors never got the memo.

Clearly, by banning break and continue, he was hoping that you would discover setjmp/longjmp.

> but he's not allowing us to even use a break or continue in our loops because he thinks it's "bad practice".

I will never in a million years understand why some people believe this.

> he's not allowing us to even use a break or continue in our loops

You can use a goto then... (yeah, even worst practice, but as long as it's allowed)

I worked on real program developed this way. It was Android/iOS tablet application with large part done in C++. C++ part was somehow generated from Java prototype and development was than outsourced to India. Those guys probably never seen C++ before, so they assumed program is correct and they used same style to extend the application. US managers were just filling reports with green/yellow/red markers preparing for release and having no clue about real state of the app. I worked for another contractor which was also working on this project (there were about 50 people working on small tablet app). I tried to fix those issues but it was really too much work and some guys just continued to broke the program as they really believed they are doing good job. So I decided to find another job and never work for huge US corporation again.

I have at times worked with people who, when the TFS task said, "implement this feature", interpreted it to mean "write code vaguely resembling this feature and mark the task complete" but not "test the code works" or even "run the code at least once". It was, apparently, someone else's (whose?) responsibility that the program as a whole actually work. I tried explicitly writing "and write the test and make sure it passes" in the task, and others on the team successfully argued that that is two tasks (we were rated on task completion). But when I made that a separate task, often a different person would end up assigned the task of fixing the result of the other. Anyone who cranked through a lot of the former was rewarded by management for getting so much work done, those who undertook the latter were seen as wasting time perfecting something that was already (supposedly) "done".

> I think he took it with more grace than I deserved.

That kind of righteous fury is very common in high school and undergrads. Teachers have to perform in front of a tough crowd.

> Teachers have to perform in front of a tough crowd.

Certainly true. OTOH, there's hardly anything worse than a teacher with Dunning-Krunger syndrome when it comes to conveying the essentials of low-level programming to newcomers, so I would say that such a reaction is justifiable to a certain extent (as long as the tone wasn't too aggressive).

Of course, it doesn't necessarily have to be the teacher's problem. I remember my first programming teacher in high school explaining in the very first lesson that honestly, he didn't have much programming experience (being a math professor) and had just dabbled a bit with pascal, but [sic] "the administration desperately were looking for someone to teach this course" and he didn't really have a say in it.

> I think he took it with more grace than I deserved.

You mean he gathered all of his students, colleagues, and superiors; officially admitted to faking knowledge where there was none, and solemnly apologized, promising never again to teach things he doesn't understand?

I somehow doubt it, even though teaching nonsense to eager youth from the position of authority is an offense damn close to sexual molestation of minors. Dealing with subtly (or not so subtly) twisted minds, and holes (and lies) in knowledge, of the corrupted students happens years later and is someone else's problem, but it's all caused by teachers like this. I don't see anything graceful in it, at all.

As others have mentioned, your choice of comparisons here is, at a minimum, badly-chosen. Not to mention, completely inaccurate.

> teaching nonsense to eager youth from the position of authority is an offense damn close to sexual molestation

You really think this?

> ... an offense damn close to sexual molestation of minors ...

Wow, that's remarkably offensive.

No, well, I mean it. I mean, the mechanism is actually similar: there's a "grown-up" (teacher) who uses their authority to convince a child (student) that something wrong (sex-related, or bad semantics of pointer assignment) is actually OK (and will come up on the next test), which then haunts that child (student) well into adulthood (job). Of course, the degree of harm done is on a completely different level, I'm not arguing that it's the same! But there are undeniable similarities in how this works. The comparison is obviously exaggerated, but by using it I want to change how teaching nonsense is seen, from a minor transgression to a major offense that it is.

I just can't understand why teachers are not judged by the same standards that lawyers and doctors are. The damage bad-educators do is very real. It may not look bad on average, but in specific cases, it can be really devastating to students' minds (vulnerable as they are).

Is it because people being taught don't (in general) vote? Or for some other reason? I don't know, but I don't think a bad educator should have an easier time than a bad doctor, who (at least in theory) would be removed from the profession outright if found out. Instead, bad teachers are left alone or sometimes transferred, and that's it. It looks incredibly similar to how some churches handle offenses of their priests.

If you're a teacher, please realize that you're partially responsible for the future life of many of your students, and start acting like it. Removing people who are not qualified (to teach) should be a priority in your school like it is in courts and hospitals. Please, stay cautious and vigilant, and don't let your colleagues tarnish the profession's reputation by teaching things that are provably (and, sometimes, obviously) wrong.

If you want to make an argument about instructors giving students incorrect information, make that argument. I might even agree with you.

If you insist on conflating it with sexual abuse of children, you are an idiot.

This doesn't surprise me really. When I was at university the programming textbooks we had were vile nasty and plainly incorrect. And those of us who dared challenge it by writing correct, robust code were penalised as the staff teaching it didn't understand the domain of what they were teaching properly and didn't have any real experience and assumed we were doing it wrong. We learned quickly to approach education with a high level of scepticism and get your info from more than one source (textbook) and find out which ones were reputable and which ones were garbage.

Basically there was a monolith in the middle of the course, a crap textbook, and monkeys were praying to it as the authority on everything.

Along the lines of taking the authority of a book as a replacement for critical thinking... I was once hired to rewrite some assembly language spaghetti code a more-electronic-than-software engineer had worked on for two years. Ostensibly he was ordered to help me accomplish this, but he was more like what in court would be described as a "hostile witness"

I spent a couple of months "hacking", just familiarizing myself with the microcontroller and its instruction set, and the tooling. (In fact, I had no source code to look at during this time, as he would not surrender it!) When I finally got his source, I rewrote it to a 100% functional equivalent in two weeks, 1/5 the size without all the unnecessary control path duplication and without pointless register moves. At the time I found the way he had used register moves particularly puzzling because it resembled the way an un-optimized compiler might work.

Months later after I was no longer on that project, I got a call from him. He was very flustered and wanted to know where I "got" a particular sequence of instructions from. I was like, come again? He said, "It's not in the book. You used a sequence of instructions that's not in the Book." (The Microchip programming manual.) He asked about another block of 3 or 4 instructions - also not in the book (in the combination I used them in).

Slowly it dawned on me - I'm quite certain he didn't understand what any individual instruction "did". That whole level level of abstraction didn't exist for him. He programmed in assembly, yes, but only using blocks of example instructions from the Book. Suddenly the pointless register moves made sense - he was acting as a human compiler, without an optimization step.

Years later I realized I should have asked him, "and how do you think the example code in the book was written?"

That seriously sounds like someone that has only learnt through rote memorisation. Absolutely great at taking tests and doing things by the book but completely unable to apply the things they should have learned to a problem that falls slightly outside the examples.

Great story though, even though it makes me shudder.

> We learned quickly to approach education with a high level of scepticism and get your info from more than one source (textbook) and find out which ones were reputable and which ones were garbage.

Ironically, I think this is one of the best lessons anyone can learn . . .

One of the reasons I don't consider a degree anything of value when evaluating candidates. It might be a baseline or it might be a red herring. Best to just assume a degree and no degree are equal and asses based on practical examples.

I would think a degree would be valuable, if for nothing else, then because it indicates a willingness to put in 4 years of work to achieve a challenging goal. Conscientiousness is the word, I believe. A university degree shows you have some minimum level of conscientiousness.

Probably correct. It’s a very expensive way to prove ones motivation however.

Reminds me of when I was 15 and the teacher we had the previous year had left and one of the maths teachers took over.

He did his best but I have taught my self faster than he did.

Though because I was on the CSE stream even though I got a grade 1 they wouldn't let me do computing at A level - as CSE kids where supposed to leave at 16.

Your last sentence is intriguing to me. I understand all the words, and I think I get the gist, but I'm not familiar with any of it.

What is CSE? Google seems to think it's Certificate of Secondary Education. So, middle school -> early high school equivalent?

Grade 1? A level? Leave where, the school?

(I'm in the US, so that's probably the disconnect)

At that time there where two sets of exams taken at 16 O levels and CSE.

The O levels where for Kids in grammar and private schools and the CSE's where for the kids who went to secondary modern schools who left at 15/16.

As I was dyslexic I was put in the CSE stream though I did get a grade 1 in maths and computer studies which is the same as a pass at O level I didn't get to do A level computing.


Part of getting through an education program is learning the differences between the right answer and the expected answer. If the book says it is so, then it is so, even if you know it isn’t. Remember both “facts”. Keep one for the test and the other for reality.

Sounds very much like doublethink.

And that's why just going to the cheapest university is a bad idea.

My mid tier university has a fundamental problem where the lecturers aren't even hired to teach, they're hired for research. And since the university can't get enough American grad students to stay and be research slaves they bring in grad students from the third world. Then these grad students are required by law to teach a certain number of credit hours to conduct research and there you go. The result is every general class is poorly taught by people who can barely speak English and don't care about teaching so you end up teaching yourself or failing.

CS courses generally don't have this issue but they do have a large amount of mediocre professors. Reason being anyone talented in CS can make 2-3x the salary out in the real world and so CS grads rarely stick around to teach.

The saying "those who can't do, teach." is as relevant as ever.

This hasn't been my experience in CS, the research professors teaching courses were interested in teaching, the only thing the grad students do is grade assignments/exam papers.

> Reason being anyone talented in CS can make 2-3x the salary out in the real world and so CS grads rarely stick around to teach.

I would call bullshit on that, money is not the only thing that motivates people to do things.

The professors or graduate students hired to do research aren't the ones who end up teaching wildly incorrect material. That happens when you hire people specifically to teach, and don't pay them very much at all.

I can speak from my own experience; even though English is my first language, I entered as a grad student and TA in a public US university as a foreigner, since my undergrad degree was done outside the US. After they determined that I was capable of speaking understandable English, they actually made me an instructor of record for a 300-level course that none of the active faculty was available to teach.

It was quite an experience.

I think the problem is a misunderstanding of the purpose of the "mid tier university". The university is a research institution and not a vocational training institution. Students who wish to learn a trade (plumbing, computer programming, etc) can attend one of many fine (and possibly less expensive) public and private vocational training institutions that have relationships with local businesses that hire graduates.

Students who wish to explore topics and solve problems at the limits of current human knowledge generally enroll at the University. At these institutions, undergraduates have an opportunity to work in research laboratories and graduate students, postdocs and faculty have the responsibility to obtain grant funding to keep their labs operating. As a courtesy, occasional classes are taught (to get students up to speed on basic theory) but there is an expectation that students are self-driven and can figure things out on their own (hence its ok if the TA is not a native English speaker, etc).

TLDR: If you want to learn a trade, go to vocational school. If you want an education, go to University.

This was exactly my experience 20 years ago. Shame things haven’t changed.

I was going to ask if any attempt was made to reach the book author for comment, but it looks like he died in 2007. RIP.


It appears that someone having read your comment here has left a horribly abusive comment on that memorial page now under the handle "Screw You Bob".

> ... The Hacker News army is here.

"Screw You Bob" you do not speak for me. Hopefully, you don't speak for many people on HN.

Words fail me. Can't we let the dead rest in peace?

They certainly don't speak for me. In fact while I found this post interesting, and I like how the author tried to get into Traister's mindset to empathize with how he may have become misguided, I felt like it meandered into personal attacks a bit more than necessary.

Maybe I'm just sensitive because I have gotten excited about having learned something and rushed to share my knowledge, only to be cut off at the knees by the more august folks who told me I was in fact doing it wrong. :-)

But also, we've all written bad code, right? Have you ever written code that you've held your nose while writing, working under annoying idiosyncrasies of the problem domain / business requirements, and you just know at some point another developer is going to come along and read that code without having the full context of the situation, and think you're an idiot, and you won't be there to defend yourself?

While it's not quite the same situation that's the kind of empathy I'm feeling here. Who knows the circumstances that led to this book. The author of this blog post admits there was a dearth of material on learning C pointers at the time so I think Traister's heart was in the right place in trying to fill that gap, however misguided.

Only yesterday, I read a comment by someone on HN in which he wished hacking should be punished with death penalty/jail term.

Fortunately that comment appears to be gone.

I'm surprised there is no way to report one of those on the site.

There is a feedback form in the lower right and the email address.

Damn, that's sad. I had a similar moment yesterday while thinking about Max Allen, the interviewer during that CBC show with Ted Nelson where they were talking about computers in 1979.

I wonder if these people ever got to know a bit about the things they didn't understand at the time. I also wonder if I'll ever get to know about the unknown unknowns in my life :)

What's also sad is some idiot went there from that link and left an abusive comment on his memorial website referencing Hacker News. Pretty pathetic, whoever did that.

I think I speak for all the developed brains on this website when I say that defacement is pathetic and weak, clearly coming from an impotent individual.

Rest in peace Mr. Traister, we love you even though C kinda thorned ya. :-)

Or as my mother taught me: "people who write in public places have stupid names and ugly faces."


The fixed-location variable allocation strategy the author mentions is called overlaying or compile-time stack[0]. It's still very much alive today, thanks to architectures like 8051 that are not really stack-friendly (even though they do have a stack).

> The Keil C51 C Compiler works with the LX51 Linker to store function arguments and local variables in fixed memory locations using well-defined names

[0] http://www.keil.com/support/man/docs/bl51/bl51_overlaying.ht...

Is there anything I can read to learn more about overlaying in C? I was reading, to my limited abilities' limits, 2.11 BSD source and there was a lot of references to overlaying.

On a modern POSIX environment, you can hack together an approximation of an overlay by loading some code to an area of memory, mprotect( ptr, len, PROT_EXEC) the segment, ((int(*)())ptr)() to call the function and if you did everything right, you don't segfault too hard. Later, you can overwrite that area of memory with other code and repeat.

This is similar to what actually goes on under the hood of the dynamic loader. http://tldp.org/HOWTO/Program-Library-HOWTO/dl-libraries.htm...

That is a different sort of overlay, a code overlay.

The other example was data, with the compiler assisting by changing local variables to have fixed addresses that get carefully reused for different variables at different times.

Maybe this weird strategy exists elsewhere, but he's supposedly teaching Borland Turbo C with the small memory model.

That compiler uses a run-time call stack.

Yesterday I encountered a similar program on a HN comment chain as shown in this link. I am genuinely confused as to why this program is bad. I am a student and I do not know the best practices regarding pointers, but it is how I would write a program to combine two strings.

Can someone please elaborate why it is bad? Are their any good resources to fill gaps in my knowledge?

Thanks in advance.

Edit: Thank you guys for pointing out so many problems. It seems that I have a lot to learn. :)

It is wrong in many ways.

It copies s and then t to a fixed size buffer, without any checks. That will write to invalid memory (probably smashing the stack) if len(s), len(t) or len(s) + len(t) > 100.

It returns a stack allocated buffer (r) pointer to the caller. The array will be invalid when the function returns, as the automatic variables only live in the function scope (during the call), they are deallocated when the function returns.

To do this right you have various strategies.

1. Allocate a buffer of len(s) + len(t) + 1 with malloc, copy the strings and return it. Have the caller free it when it's done. It can be inapropiate because of the dynamic allocation.

2. Have the caller pass a destination buffer and its size. If you know the char *'s are zero terminated, check if you have space for them in the dest buffer. If not, truncate or error out. Most of the times, this is the prefered solution.

3. Use a static local buffer, and return it to the caller. You may need to truncate the copy too. Not recomended. This is not a good solution as the function also will not be reentrant (unsafe with multiple threads).

You can use libc functions like strncpy (C89+), snprintf (C99+) etc to make a "size-checked" copy with various automatic truncation semantics. You can refer to their man pages for details.

Edit: to the downvoters, please point out what is wrong in the comment.

You're right, although any mention of strncpy should come with a big disclaimer that it's effectively broken because you can end up with a non-NULL terminated string in some cases. strlcpy should be the way to go but unfortunately it's not part of the C standard and not available everywhere (sometimes for rather bullshit reason IMO, but that's a different story).

Documented behaviour is a little different to "effectively broken". The difference between strncpy and strlcpy is that strlcpy will NUL terminate the last byte for you always. There is nothing stopping you from doing the same thing yourself when you use strncpy. If you care enough, write your own strlcpy - it's only one extra line.

I'm not saying it's hard to work around but I maintain it's broken. You have a function that deals with C-string that in some conditions returns something that's not a C-string but can't be trivially distinguished from one and will trigger undefined behavior if used like one. It's terrible ergonomics and almost certainly not what you want to do in any situation.

You can argue that truncation is an error condition but then it ought to notify it somehow, for instance by returning NULL in such a case. And even then it's incoherent with snprintf which doesn't have the same behaviour and does always terminate with '\0' even in case of truncation (assuming non-0 buffer length, of course).

It's just an unnecessary footgun that serves no practical purpose. It would be like a date function that gives you the today's date except on the 4th of December where it replies that it's the 31st of February. Not hard to work around but still broken.

It's not broken, but it is misnamed.

This is because it is not intended to work with the same kind of string that the other str* functions work with (ie. an ordinary null terminated string).

Instead it's supposed to work with fixed-width string fields that pad out values shorter than the field width with nulls. This is how original UNIX directory entries were stored.

See how the name is copied into u.u_dbuf here: https://github.com/hephaex/unix-v6/blob/daa355109625a50e6b10...

I see your point but at this point I think it's just a matter of taste. I don't really see how having a function meant to deal with a special case of character buffers disguised as a general purpose string manipulation routine in the stdlib could be considered reasonable. I understand why it's here, I understand the history, I understand why it made sense at some point to have such a function but you won't be able to convince me that it's not broken or that it shouldn't be deprecated in favor of strlcpy (ditto for strncat/strlcat). After all it is in <string.h>, not <fixed-width-string.h>, it's pretty heinous that it fails at the very low bar of actually producing a valid C string every time (especially given the very high prejudice of having rogue unterminated "strings" in a C program).

It seems fairly unlikely that the C standard would add strlcpy() and strlcat() when it already has strcpy_s() and strcat_s() in Annex K.

A misname that may have cost cost billions in bugs and security issues.

It has also avoided security issues, some of which get created when people get the idea that every strncpy must be replaced by strlcpy.

The strncpy function writes to the entire buffer. This is important if you will be passing the buffer across a security boundary, for example in a network packet or as a struct copied into a publicly visible file. If trailing bytes are not cleared, then secret data (which happened to be sitting in memory) can get leaked.

I agree it's terribly named, however discussing the behavior of strncpy is important because the casual reader or new C programmer will see "don't use strncpy, it's broken" and then come away with the wrong idea. strncpy is not inherently broken but it's most likely not the correct function to use. As a C programmer it's important to understand why and what the alternatives are (and there are several).

I can use a sledgehammer to break my leg, but that doesn't mean the hammer is broken by design. I just have to be careful where I swing the hammer and what I hit with it.

If the sledgehammer defaults to "leg breaking mode" then it's broken. How often do you actually use strncpy actually intending it not to return a non-NUL terminated string on truncation?

I hate this mentality in a lot of C circles that boils down to "there's no bad language, just bad programmers, man up pussy". I like C, I use it a lot, it's one of the first languages I learned and it's been my main "professional" language for more than a decade. Yet I can also see that it has many unnecessary sore points. Having switch not break by default, gets(), array shenanigans, some aliasing rules, the hundreds of completely different meanings for "static", macro hygiene and I could go on... You can say "it's not a big deal and it's not going to change at that point anyway" and sure, I'm not arguing for a revolution, but let's not act that it's an absolutely perfect language and I'm an idiot that doesn't get it for pointing out these issues.

Sledgehammer defaults to nothing, it just lies around before someone takes it and makes an action. You're basically saying that if a label on the tool misleads you, then you're not in charge and the tool is broken. That doesn't work in engineering, nor is it forgivable to not meet all formal requirements (like reading the documentation thoroughly). You have to look at your tool to find its best fit or at least to find a handle; moreover, labels mean nothing - they are just nicknames for synopsis. Who thinks otherwise should consider javascript or alikes. We live in a world where labels create the entire meaning, but let's not fall into this idiocy in workplaces.

The linux manual for strncpy makes few good notes about its usage and that has a meaning, since it is official and will not change due someone ranting about something again.

Absolutely. strncpy is a very badly designed interface. Is it documented? Yes. Does that make it any good? No.

Principle of least surprise. Idiot proof. Semantics that match common usage. Call it what you like.

I'm not arguing that it's a completely perfect language. But broken is an incredibly strong word for something that works when used as described in the documentation.

I think the API designer was trying to be parsimonious and not assume too much about what you want to do with the data in your destination buffer. The requirement was to provide buffer overrun protection when the source string is too large, and this function provides that. Beyond that, the requirement that the character array be null-terminated is your decision.

In fact, if you determine that it isn't null-terminated you can do other things besides null terminate it yourself. You might want to provide a helpful error message to consumers of your API rather than truncating their input data silently. Or you may try reallocating your buffer until the string fits. There's actual error handling that you can do with this function. In contrast, automatically null terminating the string makes that more difficult.

The other issues that you have with C seem like preferences. There's nothing wrong with not breaking by default in a switch statement as long as you know that that's what happens.

That's a poor analogy. It calls itself a string function but isn't. So it's more like a airsoft gun that actually shoots bullets, but you don't know until you decide to shoot yourself in the leg to see what it feels like.

If you had to name it according to unix traditions, how would you?

Maybe fldcpy and fldcat to make clear than a fld (field) is entirely distinct from a str. A better choice if you have a time machine would be to remove these from standard C altogether.

A documented behavior which causes many security and crash issues is, almost be definition, effectively broken.

Can you come up with an example of an API which you would consider to be effectively broken and yet is not actually broken? Presumably, it would be an API that's easier to misuse than strncpy.

NULL is (a macro that expands to) a null pointer constant. The null character '\0' is commonly called NUL.

I've written about strncpy: http://the-flat-trantor-society.blogspot.com/2012/03/no-strn...

Yes, I dont like the strncpy interface, it is very error prone. I just gave it as a posibility. But I did refer to the man page and said "various truncation semantics" because of that, precisely, without going into more detail though.

Not sure why you're being downvoted. That's all pretty standard.

Method 1 is a contract you often make as part of defining the interface. For an example, such things are out of scope so this is a perfectly reasonable point.

Method 2 is common. Most of the strn*() and snprintf() etc, do this. This is the preferred method for some of my colleagues.

Method 3 is used in the standard library, though as mentioned is usually avoided.

I'd suggest not insulting the people reading your comment if you want some points back…

You are right. Fixed.

> snprintf (C99+)

It's been years since I have written C professionally (and I only did it for two years), but I felt that snprintf was the single biggest improvement that C99 brought to the table, at least for people that had to deal with strings. ;-)

Friends don't let friends use strncpy(). It doesn't guarantee NUL termination and wastes time zero-padding when it is rarely desired. Use C11 strncpy_s() or platform specific equivalents.


I haven't touched C in years, but here's my descending "wtf" list:

1. Returns pointer to stack-allocated data, which immediately becomes invalid. Instead, it should be using some sort of allocation (e.g. 'malloc'), or taking in a destination pointer.

2. 'r' is arbitrarily set with length 100. Smaller strings don't need all that space, and larger strings definitely will overrun.

3. The function signature is really awkward. Without any of the surrounding textbook content, I'm not sure what behavior is supposed to happen. At first, I expected something like 'strcat', which takes two char* and appends the second one to the first one. But that isn't happening here and instead it seems to require dynamic allocation. (Hiding allocations inside a function is generally kind of weird. Usually the caller should be responsible for passing in a handle to the destination.)

4. There's no sensical limit on the loop iteration. If the input 't' doesn't have a null terminator, this is going to throw a ton of garbage into the stack space (because 'r' is stack-allocated to a fixed size). And also maybe run for a really long time.

5. 'strcpy' should usually be replaced by 'strncpy', which performs the same function but also requires you to provide a limit ("copy this string, but at most 'n' bytes"). That prevents a class of exploitable errors known as "buffer overruns". I don't know when the 'n' string functions were added to C or became popular, though.

This is a teaching exercise, so the fact that this is implemented as a separate function instead of calling 'strcat' from <string.h> doesn't seem like a big problem.

> 'strcpy' should usually be replaced by 'strncpy'

Sorry to butt in, but this is a bit of a trigger for me: I’ve had to fix a number of programs infected with this idea.

The main problems with strncpy are:

When the source string is shorter than n, strncpy will pad the target to n bytes, filling with zeros. This is bad for performance.

When the source string is longer than n, strncpy will copy n bytes but _not_ nul-terminate the target. So you need extra schenanigans every time you use it to cover this case.

So strncpy is hardly ever a good idea. Sadly there is no standard replacement that is widely accepted. More details at https://en.wikipedia.org/wiki/C_string_handling#Replacements

I agree with you completely, and in general think the whole idea of using "safe" string functions with built-in buffer length checking is wrong because it is a solution to a symptom, not a cause.

Before writing to the buffer you should've ensured that it's big enough, and decided what to do if it's not, long before actually doing it. In other words, what happens if it's not big enough? These "always use $length_checking_function" proponents miss that point. Yes, you've avoided an overflow here, but chances are something was already too small long before the flow reached here, and the fix is not to replace an overflow with truncate/not copy/etc. here, but fix the check/sizing that came before elsewhere.

> Before writing to the buffer you should've ensured that it's big enough, and decided what to do if it's not, long before actually doing it. In other words, what happens if it's not big enough?

If you planned all this out, you're still making an assertion as to the length. The contract is "give me a string of this length" and if that's not enforced by the compiler, it ought to be enforced at runtime so that the error is detected and dealt with as soon as possible.

So maybe "safe string functions" should really be "fail fast string functions."

> Sadly there is no standard replacement that is widely accepted.

True, but practically: when having access to BSD extensions, strl* are used. If only C standard is available, snprintf is preferred. I have seen C libs that will check for strl* availability, and if not, reimplement them using snprintf.

So for portability, snprintf is the way to go. For correctness, and pushing for their extended use, strl* is nice.

The least bad options seem to be strl* or snprintf(..., “%s”, ...) but yeah nothing is perfect.

There's strlcpy, but it's not part of POSIX unfortunately.

  #define strlcpy(d, s, n) snprintf(d, n, "%s", s)
Not quite the same (different return type) but close.

> 5. 'strcpy' should usually be replaced by 'strncpy'... That prevents a class of exploitable errors known as "buffer overruns".

To be honest, strncpy is barely better in this respect (as a security improvement) - truncating against arbitrary size limit in this day and age of text-only protocols... I wonder if outright crashing at the testing stage would be preferable rather than subtle misbehavior creeping into the release.

Both are bad IMO, the actual required buffer size should be known in advance.

Raw null terminated strings are just a bad idea. `std::string` and the bafflingly just-introduced `std::string_view` are the right way to handle strings. We can spare the bytes now.

> Raw null terminated strings are just a bad idea.

Absolutely agree. Scanning the memory until "we find it", potentially crossing boundaries between segments of memory with different characteristics (caching etc.) just doesn't seem right in general, and if I recall, some CPUs even used to have published errata related to that.

> std::string` and the bafflingly just-introduced `std::string_view` are the right way to handle strings.

I'd even go straight to custom implementation of Hollerith strings. Literals have lengths known at compile time, protocols would either carry the lengths alongside the strings, or be trusted (to have good strlen behavior) until they do, composite strings would compute the length out of the components, etc. This doesn't seem too complex to do, looking from from my bell tower, but I know many people here would frown upon mentioning C++ in the context of embedded development (my area).

I've mentioned elsewhere that strncpy is not a "safer" strcpy.

Even if it were, there's safety and there's safety. A function (like strncat, for example) that quietly truncates your data if it's too long isn't necessarily better than one that quietly ignores array overruns. Consider what happens if "rm -rf $HOME/tmpdir" is quietly truncated to "rm -rf $HOME/"

strncpy is not safe strcpy, for any value of safe. Period. str in the name is really misleading as it's not really a string function to begin with.

> If the input 't' doesn't have a null terminator

Then it's not a string.

Sure, but it is still a valid 'char *' :)

That's something C's type system doesn't check for. If you want protection for this case, use C++ or any other higher-level language instead.

Well to be pedantic C++'s type system doesn't check for that either it just passes around a size_t and char *.

I was referring to std::string, which is what you should be using if you're handling textual data natively.

The implementation of std::string is 99.999% of the time struct { char *s; size_t len; }, which has nothing to do with an actual type.

And it's even possible to use std::string as a buffer for binary data including NULs. I won't recommend it, but it works.

I'd suggest using a std::vector of byte-sized integers for clarity, though there's nothing wrong from a standards point in using a std::string.

Also, it allocates ints x and y, saves a value to y, copies it to x, and only uses x from then on. y is entirely redundant.

Besides stylistic (dang, that code is hard to read), efficiency (it scans through s twice), and safety (it will do bad things if the combined length is > 100) concerns, it's also just plain wrong:

It returns a pointer to memory on the stack; but the stack will shrink when the function returns, and that piece of memory will get re-used. Which is consistent with the article's later claim that "I don’t think he understands the call stack."

From a purely safety-minded perspective, this function has a hidden bound of r[] past which the function becomes unsafe, but it does not check. Not only does it not check, the design of the function means that there is no possible way for it to check safely. s and t are both pointers to characters? How long are the strings they might represent supposed to be? Who knows? This code is incredibly reckless and it's dangerous to introduce students to such carelessly designed examples when first impressions will define how they program for a few years or so.

Ok, so it's reasonable to think that this is supposed to be a teaching example and that considering these concepts might be a bit too early in the process. This leads into the second problem and one that is more subjective: this code is incredibly dense and relies on enough quirks of C that it's almost never going to be clear to a beginner reader what they are supposed to take away from it. It's maybe useful as a quiz question on C syntax and semantics, but there are enough barriers to understanding what the code is supposed to do that the amount of explaining the text would need to do to describe what the code is doing is most likely prohibitively long. Instructive examples should be unambiguous in what they are trying to show, otherwise students will be confused and potentially conflate issues in a way that is difficult to untangle later.

Edit: Ha! I spent so long looking at the first half of the function I totally missed that it was returning r! So, not only does this code have minor issues here and there from its careless implementation, it has a fundamental flaw that, if it were to work, would do so only by accident. I can only imagine that a student might walk away from this example thinking that C functions can return arrays and possibly misunderstand scoping in C.

I'm sorry, but I don't think you've quite gotten the reasons why it was lambasted:

> Not only does it not check, the design of the function means that there is no possible way for it to check safely. s and t are both pointers to characters? How long are the strings they might represent supposed to be?


> This leads into the second problem and one that is more subjective: this code is incredibly dense and relies on enough quirks of C that it's almost never going to be clear to a beginner reader what they are supposed to take away from it.

Most "nice" C functions are much terser than this.

strlen is not safe and the design of this function does not permit it to ever be safe. That is my point.

Most "nice" C functions are also not used as teaching examples. There's a difference in how one writes C code for production use and instructive use.

I take it you come from a higher level language, where null termination would seem risky. In C, however, strlen is considered safe (as opposed to say strcpy, strcat, etc. which do have "safe" replacements). As for terse examples being given to beginners, here's the example The C Programming Language gives for strcpy:

    void strcpy(char *s, char *t) {
    	while ((*s++ = *t++) != '\0') ;

C11 introduced strnlen_s, a "safe" replacement to strlen.

strnlen_s is in Annex K, which is an optional part of the standard. Incidentally, no libc in widespread usage has implemented it.

True. The non standard strnlen is however more widespread. Microsoft implements strnlen_s in terms of strnlen like so.

    _String == 0 ? 0 : strnlen(_String, _MaxCount)

What’s safer about it? Doesn’t `strlen' just scan until the null character, then go

    nullCharPtr - str
? Is there something unsafe with that?

strnlen_s takes a parameter for the maximum number of characters to scan. This way, it won’t overflow the buffer you provide it if there’s no trailing null byte.

What if there is no null?

Then you probably shouldn't feed it to the str* functions at all. Use the mem* functions instead.

That's the same as saying you should never use the str* functions. Nothing in C guarantees a string must have a null byte. As you're well aware, C-strings aren't a separate type that's distinct from char arrays.

Sure you can say that good programmers will always ensure a C-string is a C-string but there's decades of programming history that shows that's not true in practice.

Then use a language other than C that doesn't shoehorn string handling into char , as I've mentioned in other comments. If you're have a char with no terminating null byte and you hand it to a function with "str" and its name, you're going to have a bad time.

If there isn't a NUL, then you're asking for a property relevantly distict from length of a datum relevantly distinct from a C-string. Use `strchr(s,0x00)` or `memchr(s,0x00,zs)`.

You haven’t read K&R C I’m going to guess. Terseness is a traditional component of C and UNIX (cf creat). I don’t care to debate the merits but that is a fact. Most C books in the 80s were written in this style. Furthermore in the days or micros code was generally terse for many and varied reasons. There were people typing code on membrane keyboards.

Not a C programmer, so I'm sure I'm missing a ton of more subtle issues, but some of the immediate questions to ask:

* What happens if s or t are longer than 100 characters?

* What happens if s and t are both longer than 50 characters?

* What happens if no element of s == '\0'? How about t?

* r is allocated on the stack. What happens to the memory pointed at by r when you call another function after calling combine? Say you wanted to combine three strings; could you call combine twice to build up the result?

For the first two points, you end up overwriting the stack and most likely crash. This is how security holes were/are made.

For the third point, you have two main ways of storing strings in general. The first was called pascal style, where the length is stored first in either a byte or two and then the string data with no (or an optional) terminating null character (the '\0'). The second is referred to as c style, and is done by storing the string in a memory and denoting the end of the buffer by a null character.

The c-style enables certain nicer looking c code with loops and such, but it is more dangerous and potentially expensive if you recompute the length all the time. You can always mix and match the way you store the string, such as in C++ where the string class stores a length as well as a null terminated string in a buffer.

For your forth point, yes, you end up corrupting the string leading to more crashes and/or it can be used as an entry point to screw around with your program's stack.

Back in the day it was pretty much granted you ended your strings with a \0.

There are multiple problems with that code: 1) return(r) He is returning pointer to temporary! When you declare and initialize variables on stack after that function, they will overwrite memory pointed by r. 2) He is assuming size of string pointed by s(including 0) is less than 100 and also the combined sizes of s and t are less than 100. Stack Overflow! 3) Not an errors, but would not pass my code review: Inconsistent Variable declaration, using signed integer for loop where you don't need signed. Also I would have used strcpy 2 times, or loop 2 times, not two different ways. Beside that, you should not use strcpy, but strncpy to avoid stack overflows.

> Inconsistent Variable declaration

How so?

> I would have used strcpy 2 times

You can't do this, because strcpy doesn't give you the length of the string you copied, which is necessary to put the trailing null byte.

strcpy writes a trailing zero.

> The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest.

Yup, I'm aware of that, but the author of that code sure isn't ;)

There are a few bugs in it. It returns something on the local functions stack (r) which gets torn down after return. It uses the insecure strcpy allowing the caller to stack overflow the destination. It increments t in the loop after dereferencing it which only changes the value pointed to by t. x and y are improperly initialized, it null terminates r without checking for x's value(again a buffer overflow) and there is no upper bound limit checking in the for loop(second for() parameter). I'm sure there are other bugs too but I haven't looked at it too closely.

But,yeah... It's a massacare.

> It increments t in the loop after dereferencing it which only changes the value pointed to by t.

This is wrong. Postfix increment has a higher precedence than the dereference operator.

Are you sure? I'll need to test this. Always thought postfix increment and decrement applied after everything else was evaluated,and that's why prefix versions exist.

Yes. Here is a table that lists operator precedence in C: https://en.cppreference.com/w/c/language/operator_precedence

Note that postfix increment is at the highest precedence and the dereference operator is at the second highest precedence.

The postfix operators behave in a bit of an interesting way; the expression "t++" evaluates to just "t" but increments t as a side effect. Consequently, it appears to update t after the expression it is part of, even though it is one of the first things evaluated.

If you actually test the function, you will find that it works provided that you make r static or global instead of a local stack variable -- and, of course, carefully mind the restriction on the length of the input strings to avoid overflowing the buffer.

Thank you! I learned something new.

> x and y are improperly initialized

How so?

I said that because I forgot the strlen was there. I was wrong.

I know there are already plenty of comments about the problems but might as well give mine too...

In one sentence: this guy managed to reinvent a variant of strcat and get it completely wrong.

Here's a somewhat better version:

    char *combine(char *s, char *t) {
        size_t m = strlen(s), n = strlen(t);
        char *ret = malloc(m + n + 1);
        if(ret) {
            memcpy(ret, s, m);
            strcpy(ret + m, t);
        return ret;

You already know the length of t. strcpy is inefficient when another memcpy will do.

Minor bug:

If s and t are very long and overlapping, m + n could wrap around.

It returns the result r, which is a temporary, stack allocated array that becomes undefined by the time your return from the function. That’s the worst violation.

Furthermore, r is a fixed 100 bytes. There is no overflow checking whatsoever.

The C syntax is a bit too dated for me, but I think it returns a pointer to a stack buffer (which is capped at 100 bytes, so it could overflow) and goes through s twice? Plus it's not concise at all.

> Can someone please elaborate why it is bad? Are their any good resources to fill gaps in my knowledge?

If you're actually learning C and writing programs in it then the two best things to do would be turn compiler warnings up to 11 and run valgrind often. So you want -Wall and -Wpedantic when you compile and if possible run with valgrind as part of your build script or run it with your tests, just run it often (the longer between runs the harder it is to track back to which change you made).

Or -fsanitize=address, if your OS does not support Valgrind.

Note that you forgot '-Wextra', but even `gcc -Wall -Wextra -Wpedantic` doesn't enable all diagnostics (there are some 40 additional arguments to provide, depending on GCC version); so what you really want is a recent version of Clang and '-Weverything'.

Putting my CR hat on:

   1 char *combine(s, t)
   2 char *s, *t;
   3 {
   5   int x, y;
   6   char r[100];
   8   strcpy(r, s);
   9   y = strlen(r);
  10   for (x = y; *t != '\0'; ++x)
  11     r[x] = *t++;
  13    r[x] = '\0';
  15    return(r);
  17 }
There are several critical memory errors here.

1) The function is returning the address of a local variable. This alone makes this function rubbish.

2) The pointers s and t are unknown length, we have no guarantees that concatenating them will fit in a 100 character array. Also, he probably should be using malloc to dynamically allocate the space.

3) Use of the function strcpy rather than strncpy. He should have measured the length of s first, and then use strncpy if s was longer than 99 characters (don't forget the null terminator!). Then, rather than using a for loop, call strncpy again to copy the rest into a safe buffer size (the for loop is rather silly). The reason for this is that strcpy will cheerfully start copying past the array 'boundaries', and in this case, since he's copying into a local variable on the stack, is setting himself up for a Remote Code Execution attack if this ever gets untrusted input.

So those are the critical errors. These tie directly into why Geoff argues that the author doesn't understand the stack.

So let's, for educational purposes, go into this. We're going to go a bit into the weeds here. Sorry about that. This would be easier with a whiteboard. :)

When you fire up a program, the programs machine instructions get copied into memory, let's pretend at the memory location 0x1000. Far away from that code, at the highest memory values (more complicated on modern virtual systems, but hey, let's go back in time here :) ), the computer keeps track of a location called the stack pointer.

I'm going to put forward 3 diagrams now. Please forgive any off by one errors.

  (Diagram a)
    A 0
    B 0
    C 0
    SP 0xffff
    PC 0x1000

    Address  Mnemonic   DATA
  PC0x1000   MOV 1, A   0x00 0x01 0x01 0x01
    0x1004   MOV A, C   0x00 0x01 0x03 0x01
    0x1008   PUSH 3     0x01 0x00 0x00 0x03
    ...      ...
    ...      ...
    ...      ...
    0xfffc   XXXXXXXX   0x00 0x00 0x00 0x00 <-- Stack Pointer is here
The program starts at 0x1000, then after executing the first two move (MOV) instructions, the state of the world becomes as follows

  (Diagram b)
    A 1
    B 0
    C 1
    SP 0xfffe
    PC 0x1000

    Address  Mnemonic   DATA
    0x1000   MOV 1, A   0x00 0x01 0x01 0x01
    0x1004   MOV A, C   0x00 0x01 0x03 0x01
  PC0x1008   PUSH 3     0x01 0x00 0x00 0x03
    ...      ...
    ...      ...
    ...      ...                     v------\
    0xfffc   XXXXXXXX   0x00 0x00 0x00 0x03 ^-- Stack Pointer is here
When you have code like

  void function() {
     int a = 5;
     int b = 2;
     return a;

  void main() {
    return function();
It'll get turned into something like (I've set a 'break point' at 0x2010)

  (Diagram c)
    A 5
    B 0
    C 0
    SP 0xfff8
    PC 0x2010

    Address  Mnemonic      DATA
    # Main starts here
    # (Note, in C, there is actually code that gets executed before this)
  PC0x1000   PUSH 0x1008   # We want to remember where to return to, so we push it to the stack.
    0x1004   JMP  0x2000
    0x1008   EXIT A        # In this implementation of C, the A register will propagate return values
    ...      ... 
    # Function 'function' is here
    0x2000   PUSH 5        # Local variables go on the stack.
    0x2004   PUSH 2
    0x2008   MOV [SP+2], A # Locally, we refer to local variables by
                           # offsets to the stack pointer, so if this function
                           # were to call itself, the stack would keep growing down
                           # but these values would be good.
    0x200c   MOV SP+2, SP  # Reset the stack before returning
  PC0x2010   JMP #SP       # Made up notation. Look at the value of the stack pointer, pop it, and jump to it.
                           # in x86, this is kinda what RET does.
    ...      ...
    ...      ...           ...
    ...      ...           ...         
    0xfff8   XXXXXXXX      0x00 0x00 0x00 0x00SP  
    0xfffc   XXXXXXXX      0x02 0x05 0x10 0x08
Okay! So with the above diagrams in mind, let's recap what goes on the stack. Local variables and return addresses. Each time a function gets called it moves the stack pointer down[1] (to lower memory addresses) to make room for local variables. So after you return from that function, and then call another function (or heck, the same one) that pointer you have that was supposed to be the concatenated string is now going to have it's values overwritten.

Furthermore, if the input strings are longer than expected, than they can overwrite values on the stack itself, including the return addres, causing your program to jump to some (if you're lucky) random location in memory.

Honestly, some of the best ways to get intuition for how the stack works, and the things that can go wrong, are CTFS at overthewire.org.

Also, https://microcorruption.com/



[1] Sorry, 'down' means lower memory addresses, even though the displays of memory layouts always have lower memory addresses "up". :(

Most of these points are covered by the other comments. As a C programmer professionally, I'll go into a little more depth, and offer an alternative implementation for comparison.

The function in question:

    char *combine(s, t)
    char *s, *t;
          int x, y;
          char r[100];

          strcpy(r, s);
          y = strlen(r);
          for (x = y; *t != '\0'; ++x)
               r[x] = *t++;
          r[x] = '\0';
1. The array 'r' is allocated on the stack, and returned from the function. This is bad because 'r' goes out of scope as soon as the function is returned. This function returns a pointer to memory with essentially unknown contents.

2. The array 'r' is allocated at a fixed size. This is okay if you know ahead of time that you know this size, however for this function we don't know the lengths of s and t, so the odds that we are allocating the right amount of memory is slim.

3. As a result of points 1 and 2, the strcpy and loop may cause a buffer overflow. This is a class of bug where it is possible to overwrite memory that should be unavailable to us. In this case, if the combined lengths of s and t happen to be equal or greater than 100 characters, we will be overwriting memory that does not belong to r, corrupting it, and potentially crashing the program. This may additionally be as security risk, as buffer overflows can be exploited to execute malicious code.

4. There are no checks to see whether s and t are valid pointers. If they are NULL then the function would generate a segmentation fault. This is a check that is often ignored in cases where it is deemed to potentially hinder performance if the function is used frequently.

saulrh also mentions the case where s or t are not NUL-terminated. This is often considered to be a pre-condition of the function in C, suggesting that providing the function strings that aren't NUL terminated is an issue for the user.

For some comparison the following would be my first cut at the same function. Note I the comments are only for illustrative purposes. I'd omit them in actual code.

    char* combine(const char *s, const char *t)
        size_t slen, tlen;
        char *str;
        // optional checks for validity (point 4)
        if (NULL == s || NULL == t)
            return NULL;
        // get lengths of s and t, to calculate allocation size
        // also save them to use with memcpy later
        slen = strlen(s);
        tlen = strlen(t);
        // allocate to heap (point 1)
        // allocate correct size (point 2)
        str = malloc(slen + tlen + 1);
        if (NULL == str)
            return NULL;
        // use memcpy since we already know the size
        memcpy(str, s, slen);
        memcpy(str + slen, t, tlen);
        str[slen + tlen] = '\0';
        return str;

    if (NULL == s)

I haven't seen a single case where this abomination actually helped catching the fearsome 'if (s = NULL)' typo. One needs to be a sloppy typist, not paying attention to what they write, not proof-reading the code before committing and ignoring compiler warnings for this disaster of a notation to be even remotely justified.

Yoda conditions are protection against an issue that no longer exists in most compilers.


Precisely! Yet they still pop up and ruin readability of the code just the same.

Better yet, just use the fact that NULL is false and get shorter, cleaner, and safer code.

  if (!s)

God no... explicit is better than implicit.

Your statement there just does not read correctly, always have the condition explicit.

Why? It reads nicely as "if there is no string", can't get more explicit than that.

The biggest problem I have with this style is that it is inconsistent across variable types and provides no insight into the actual operation being performed without further knowledge. My personal experience is that my ability to read and comprehend code quickly is dependent on the context and patterns present within it (among other things). Therefore making the expression explicit as a pattern adds context and therefore increases readability and comprehension.

The statement '!x' tells me nothing about x, just that I'm expecting a given logical value. The semantic meaning of that value however, I have no idea. Odds are it is probably a NULL or 0, but that doesn't help much because (at least in the linux world) they can have different meanings despite having the same logical value. I cannot differentiate between success (!(x == NULL)) or failure (!(x == 0)) without more context.

The statement 'x == NULL' immediately tells me at a glance that I am dealing with a pointer, and therefore I should pay more attention to how it is used. I know I should now be looking for other patterns of safe/unsafe pointer management that are immediately relevant to the function I am reading right now. There is no need for me to look elsewhere and go on some tangent to find out, then to have to recall what I was doing when I come back later. I can immediately start considering the likelihood of segmentation faults, or other memory management issues. x == NULL also tells me that here I am expecting failure of some kind. The context around that condition should then tell me which kind, eg NULL == alloc_thing() vs NULL == find_thing().

Similarly, the statement 'x == -1' tells me I am working with an integer. This can tell me immediately whether I am expecting success (x == 0), failure (x == -1), or that I should make a mental note of more complex possibilities (THING_WORKED == do_thing()).

Code is a narrative. It is documenting the answers to the questions you are asking while writing it, and it should answer the questions that someone else is asking while reading it. When somebody asks you to explain something, the least helpful thing you can do is answer a plain "yes" or "no". It is better to give context and answer further related questions before they need to be asked. Readability and comprehension of code are no different - any given programmer is asking questions of the code. The more questions they have to ask, the longer it will take them to discover the answers in order to understand the code. Context helps. Shorter is not always better.

What is not explicit about that statement? The evaluation of the variable is well defined in a Boolean context.

If you have to write “== true” everytime that’s your own idiosyncratic hangup. It has nothing to do with explicitness, it’s just redundant. The true test was explicit the moment you wrote if (

And thus in your disgust you have actually given a case where this convention might actually save somebody time. If it happens to end up being your own time, you ought to be thankful for it. You certainly don't want it to be wasting your customer's time.

I actually recently saw an 'if (x = 0)' or similar get caught in review recently. Time pressure increases, tests are rushed, and authors proof-read the code in their head, and not what's on the screen. These things do happen, and if your personal style preferences get in the way of using a simple trick to save your own time at best case, or multiple other people's time at worst, then you might wish to reconsider your preferences - even if it is at 1 in 1000 odds.

For all intents and purposes this is one of many typo classes that just happens to have a cute "antidote", and which also has been all but rendered pointless by the existence of respective compiler warnings.

I've seen weirder stuff get through the reviews and compile cleanly, something like "f,()". Typos happen, but (a) it's not a good enough reason to make the code less readable (b) if the code is prone to this sort of errors, just pay closer attention to them during the review phase. Hedging against a single exotic type of mistake that virtually never happens at the expense of code readability is unacceptable.

I strongly disagree that this impacts code readability, except perhaps for a beginner, but stretching a beginner’s mind a bit with a concept that’s not exactly a major challenge to understand hardly seems like a crime against humanity.

You can avoid the need for setting the null terminator explicitly by using memcpy(str + slen, t, tlen + 1) as the second memcpy.

And this is why we have code review ;)

I've definitely bought a few programming books that, on closer inspection, just appeared to be money-grabs from the author.

But, my biggest memory is the book I didn't buy. I once worked with a programmer who wasn't very good, and then I heard he wrote a book. I typed his name into Amazon, and there was his book. It was all about the half-baked concepts he was trying to put into our failing project. (Ultimately canceled because we couldn't ship a very simple product. We couldn't ship it because everyone just wanted to add code generators and additional layers around a database... Instead of learning how to use a database.)

I couldn't get out of that job fast enough.

When I was learning C in the eighties, I bought a book about 3D programming, the worst programming book I've read. I believe that examples worked, at least the ones that I typed did, but the style was atrocious. The concept of function parameters seemed to be totally alien to the author. The idiot created x1, X1, x2, X3, x, xthis, xthat... variables instead. He was a former BASIC book author too.

I can't warn you because I put it to the trash bin long ago.

He was a former BASIC book author too

Hmm, I'm starting to see a pattern. Is it possible BASIC, plus lack of internet back in the day, plus attrocious books are the reasons for truning poeple into terrible programmers? I happen to know only a couple of seniors but without exception their code, no matter what language written in today, is horrible on all fronts. I used to think it was a lack of attention to detail, their lack of wanting to strive for even the tiniest bit of more than just 'good enough for tady'. Possibly stemming from lack of education and lack of continuous self-education. But maybe there's more to it. Maybe they were influenced by a bad book. And/or by a not-so-optimal language like BASIC.

I would say "it depends".

Just as poor programmers came later from Visual Basic, or MSVC++ (somehow we went through a phase were everyone coming for interview with MSVC++ actually had C with classes, and for a while it was a warning sign and standing joke at the place I then worked). Getting people who claimed C++ who actually knew it was pretty challenging around the millennium.

In the 80s just about everyone who had an 8 bit started with BASIC, and an awful lot of them managed to go on to be perfectly acceptable programmers when they moved on to C, C++, assembler, and more recent languages, or even turned out a decent game in something like Blitz Basic on the Amiga. Then again there were some who couldn't move beyond BASIC, because it was simple enough almost everyone could piece something together with it - so even back then it was a warning like PHP can be today.

It's not age, and it's not BASIC - there's plenty of younger folks turning out abysmal code who've never been near it.

I wonder how I'd have turned out if I'd learnt C from one of this guy's books instead of K&R and having a couple of experts handy.

Might be totally wrong, but it's possible that MSVC++ programmers typically knowing 'C with classes' might come from the gaming world, as that's a fairly accurate description of some very popular game engines (Valve's Source engine[1] being the one I'm most familiar with, where e.g. std::string is unused in favor of char/wchar arrays).

[1] https://github.com/ValveSoftware/source-sdk-2013 (originally released in 2004)

It is a bit more complicated than that.

C wasn't much used in MS-DOS, as it was yet another systems language trying to gain the place of Assembly for high performance applications.

In some countries Pascal dialects (mostly TP compatible), Modula-2 reigned, while in others C and C++ were other contenders.

By the time Windows (written in C) became mature enough people started caring about it (3.x), C++ was already having its place via OWL (later VCL) on the Borland side, and Microsoft eventually came up with MFC.

However these were the days when C++ still didn't had a standard (which came in 1998), beyond the C++ARM book, so either you would stick with the compiler framework, or try to minimize language features for better portability.

Additionally on Windows everyone was learning it via Petzold's book, where he takes the approach C compiled with C++, not even "C with Classes".

Also although OWL and VCL were great OOP libraries with nice abstractions, MFC was pretty much a Win32 thin wrapper as its initial internal implementation (AFX) wasn't well received by internal MS employees as not being low level enough over Win32.

So there were lots of issues going on that lead to such cases.

Interesting, thanks, that clears up some of mystery from the Windows side of things. Explains why so many came in who clearly had experience, but were missing most of the ++ part.

Course back then STL (mostly now the standard library) was still sgi STL, and there was also RogueWave, both still fairly new and on the up.

Oh God, Petzold's book. What a piece of trash that was. The code, while not incorrect like the one in the article, was fairly atrocious, and the various UI things he did were unforgivable, if only because it encouraged legions of programmers to completely disregard any human interface principles. If I could erase one CS book from history, that might be the one.

You're not the first to notice a pattern, and you're in good company:

> It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

  --Edsger W. Dijkstra 
Though, to be fair, he would likely say that about a lot of mainstream languages today. Iirc he was very fond of Miranda, which in many ways was a precursor to Haskell.

To be fair, he said that about a lot of mainstream languages back then. That quote is from EWD498 at http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EW... . The larger context is (the rest of the comment quotes him):

FORTRAN —"the infantile disorder"—, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use.

PL/I —"the fatal disease"— belongs more to the problem set than to the solution set.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence.

APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums.

Having a discussion with someone who quotes Dijkstra can be frustrating. He wrote Go To Statement Considered Harmful and On the Cruelty of Really Teaching Computer Science and left behind a bunch of pithy quotes to mine besides. You end up having to explain why blanket bans on goto don't make sense, why it really is quite difficult to mathematically prove that a program works, or why it makes sense to teach students software engineering and not just teach them the mathematics.

At this point I'm ready to say "Dijkstra considered harmful" or "the use of Dijkstra quotes cripples the mind, their use in discussions should, therefore, be regarded as a criminal offence."

> Having a discussion with someone who quotes Dijkstra can be frustrating.

Agreed. Most of the time, it seems the pithy quotes are the only thing that the quoter knows about Dijkstra.

> You end up having to explain why blanket bans on goto don't make sense

For Dijkstra's GOTO quote specifically, there's a wonderful document by David Tribble named "Go To Statement Considered Harmful: A Retrospective" that does a line-by-line analysis of Dijkstra's paper and explains what Dijkstra's meant in the context of his time and examines where usage of GOTO still makes sense today.


Dijkstra is the Nietzsche of computer science: Eminent, quotable, often irritated, and prone to be taken exactly the wrong way by people who only half-understand his ideas.

The dicta of Dijisktra should be handled with the same accommodation of self-contradictory paradox as is accorded to zen koans.

Ask them whether they'd use Yogi Berra quotes to decide how best to manage a baseball team.

Dijkstra is the Godwin of Computer Science

I did learn BASIC first, but then Pascal and in college and from books very soon learned the foundations of structured programming. It was an instant aha! since as soon as you write anything but trivial programs, you feel you need functions.

What I find disturbing is that author thought every language was the same save a few keywords, and then publisher was happy to put that crap to print.

Yeah, the quote about a professional C programmer rejecting it and the publisher publishing it anyway was telling!

Most of the major game engines of the past 20-30 years were programmed in a large part by people that started with BASIC.

You have only two data points, and obviously the second data point (the post you’re responding directly to) was remembered and posted because of its similarity to the main article, which is anything but an independent data point.

I think your conjecture is based on confirmation bias and a bad experience you’ve had with some colleagues.

Yes it is certainy possible there's simply a lack of data points but I don't think confirmation bias is at play here; at least for me personally I only know a couple of very senior devs and they all have the same problem so it's not like I'm actively cherrypicking only cases which seem to prove my point (which btw is just 'I see a pattern' not 'everyone who used BASIC once and had no easy access to proper source sucks')

2 data points is not a pattern. An equally plausible explanation is that you’re an environment full of idiots, junior and senior alike, yourself included. Your numerical and analytical skills are not unsupportive of this conclusion.

Keep in mind BASIC was the most accessible language in the 80's. Almost all computers shipped with basic... It was like today's Javascript.

I think the JS comparison undersells it. Many computers would literally drop you into a BASIC prompt at boot. I remember my first encounter with MS-DOS and finding it weird that you had to actually run a program to get to BASIC.

Yes, you are absolutely correct! BASIC seemed much more accessible on early microcomputers, when compared to any programming language today. Basic essentially was the OS. The Apple II, C64, etc. extended basic with DOS-like commands. They often shipped with BASIC manuals!

Apple’s guide to AppleSoft BASIC came with my IIGS and was my first programming book. It seems so much harder to get started now.

Yes, it definitely seems much harder. The BASIC guide that came with my Texas Instruments TI99/4A was my first programming book!

Replace BASIC with JavaScript and book about Scala language on Spark for a contemporary topic adjustment. You'll see the same there.

There's a sense in which the hardest programming language you'll ever learn is actually your second one. I think the reason for this is that with just one language under your belt, you have very little ability to distinguish between the abstractions the programming language offers you and the capabilities of the machine, and to distinguish between the abstractions the programming language offers you and the capabilities of programming itself. So for your first language, you're learning what is actually just an approximation and simplification of that first language, where all three of those things are all mixed together so you don't have to spend the cognitive effort to understand the differences and you can develop and rely on huge misconceptions without seeming to pay too large a price, but with your second, you're unlearning errors about the machine, unlearning errors about programming in general, and also learning a second language. Particularly difficult if you're making the leap from something like BASIC to C, where the second language is also substantially more difficult than the first.

For people of certain of a certain psychological orientation, there is the additional challenge that having put away your first language, you now think you are a "Programmer (TM)", and learning that second language and learning that you have a number of misconceptions can strike at your very identity. People can get psychologically attached to their misconceptions if it means retaining the illusion that they have mastery.

Nowadays the easiest way to screw this up is to go to a computer science/engineering program that uses just one language. As tempting as it may be from a curriculum simplicity perspective, it's a big mistake. I've interviewed a number of people who think that Java === computing. Not even the "JVM", mind you, but Java, the language, itself. I don't blame Java for this, it's the education. Java itself is not a great lens to understand computer capabilities through, and it's a miserable language to be your lens to understand the general capabilities of programming through, especially 10 years ago. (It's slowly getting better, with easy closures and such, but it's still stuff bolted on the side 20 year later.)

Looking at it from that perspective you can see why 8-bit-era BASIC was even worse than that. It offers a very impoverished view of the computer's capabilities and a very impoverished view of the possibilities of computing. (It was possible to rehabilitate BASIC into at least a passable language; I'm glad I don't have to use Visual Basic to do my job, but it's still light years ahead of the BASICs that still used line numbers, and I've done Real Work (TM) in it, albeit a long time ago.) A 21st-century Java-only programmer is substantially better equipped than a 20th-century 8-bit-era BASIC-only programmer.

(By "8-bit-era", I mean the timeframe, not necessarily the CPU. I'm fairly sure there were BASIC implementations with line numbers and such on non-8-bit-machines, and they'd still be dangerous. But as computers got into the 16- and especially the 32-bit era, even BASIC had to grow up.)

I have to say that having GFA Basic on the Atari ST as my first real development language made a huge difference, given that it had code blocks, functions, local variables, typed variables, arrays and record types and built-in commands for OS calls, memory access, matrix operations and a ton of other stuff. Made moving to Turbo Pascal relatively easy.



> I've interviewed a number of people who think that Java === computing. Not even the "JVM", mind you, but Java, the language, itself.

Could you elaborate on this? What exactly made you realize that was how/what they thought?

I had to ponder on what it is that really sets this sort of person apart, and I think it's the sort of sneering disdain at the idea that any of the other languages in the world are worth anything, or have any good ideas. Or maybe it's the way that when you ask them what's good or bad about some other language, you get back just a list of differences those languages have with Java, and it is simply assumed that all differences are ways in which they are inferior to Java.

And let me say again that it's not specifically Java. I've seen a couple of people that way with C, for instance, though not in an interview situation.

I have an otherwise quite good book on Direct3D 9.0c (in C++ of course), but there are 'little things' which look like mere bad style, but which I'm pretty certain cause undefined behaviour.

Using the Windows 'ZeroMemory' macro to assign bitwise zero over a newly declared object, rather than using a constructor like god intended.

In C++, null isn't required to be bitwise zero, so I'm fairly sure nasal demons are possible here. (Do we still have 'effective type' in C++?)

> Pointers to functions are seen mainly as a way to obfuscate your program. "A pointer to a function serves to hide the name and source code of that function. Muddying the waters is not normally a purposeful routine in C programming, but with the security placed on software these days, there is an element of misdirection that seems to be growing." (p. 109)

> "GIGO (garbage in, garbage out) is a term coined to describe computer output based on erroneous input. The same applies to a human being." (p. 152) — ???

(like the readers of this book?)


I'd be interested in a review of the author's C++ pointer book Conquering C++ Pointers


> "Both programs also contain another value of 43. This is the constant that was written directly into the program." (p. 29) — I have no idea what this means.

> I believe that the author thinks that integer constants are stored somewhere in memory. The reason I think this is that earlier there was a strange thing about a "constant being written directly into the program." Later on page 44 there is talk about string constants and "setting aside memory for constants." I'm wondering now…

Yes, most of the book is wrong. In this example, the author probably also presented this idea in a wrong way.

But the author is correct for having the idea that "constant being written directly into the program" (by the compiler!) and "integer constants are stored somewhere in memory", they are correct and make perfect sense. Of course the integer constants and string constants are all allocated and stored somewhere in memory (or somewhere that can be mapped as memory). They are usually known the text segment and data segment.

> …(remember, the array name becomes a pointer when used without the subscripting brackets)"

> "…while a pointer, as always, is a special variable that holds the address of a memory location." (p. 57) — Still wrong, but slightly less wrong.

Good enough IMHO. It is true that an array "name" is a pointer to its base address.

It is true that an array "name" is a pointer to its base address...

No, it's not. It is true that an expression of array type, when it is not the subject of either the unary-& or sizeof operators, evaluates to a pointer to the array's first element.

  sizeof array
gives the size of the whole array, not the size of a pointer.

gives the address of the whole array, not the address of a pointer.

I've programming under the useful but wrong assumption, that

> array[x] and *(ptr+x) is completely equivalent, so array and ptr is equivalent.

until now. Thanks for the clarification.

Strictly speaking, the initial part of your assumption is not wrong, but the later conclusion is.

array[x] and *(array+x) are indeed equivalent for any identifiers 'array' and 'x' (assuming one of those evaluates as a pointer value, and the other as an integer value; otherwise the code is incorrect). In fact, in this context an actual array is not subject to either unary-& or sizeof operators, so it evaluates to a pointer value, fulfilling the precondition.

This is why "array subscription" also directly works with pointers (i.e. "ptr[x]"), and from the equivalence above follows one of the common useless facts that you can swap the identifiers (i.e. "x[array]").

(This comment is probably confusing enough without saying that "(&array)[x]" is valid code too, but isn't the same thing as those before.)

A constant can simply be part of the actual assembly instructions in the code segment. Depending on what type of constant and how large it is, it might just be inlined instead of using a memory load. Shared use can also be a compiler heuristic, a large constant used 100 times is more likely to be a shared reference instead of inlined.

Accurate. They don't have to be inside the data segment - they can simply be part of the actual assembly instructions. But more or less, it means the constant is "being written directly into the program".

If "program" refers to the object code unambiguously, I don't think this expression is problematic per se.

Heh, not only this but there are also plenty of C++ books that are literally less than worthless.

Additionally another common domain of clueless writing is computer graphics and the related math. There are so many articles written by enthusiastic people (no doubt) where the information is just adding noise. Finding trustworthy good quality information requires that you know a considerable amount already so you know what is good and what is not (talking about online content here) :)

Didn't some of the C++ books have to be crappy in some ways due to limitations of free (as in beer) compilers with arbitrary limitations? One I vaguely remember was a ridiculously (even for the time) limited stack size, so a lot of examples had to do unnecessary allocation just so readers could compile.

I think it was this book (https://www.amazon.com/Flights-Fantasy-Programming-Video-Gam...) that taught me 3D programming better than anything else. The code was readable, the maths was well explained and it included sections on how to do things without those newfangled maths co-processors. I'd love to buy a copy now just to see if it really was a good book or if it led me astray.

I doubt books like these are as common today, but tutorials are everywhere. I don't know how often I have found scheme tutorials that teach a language I barely understand. Not dangerous maybe, but very weird nonetheless.

I see beginners writing code like that all the time, which makes me sad.

Yeah, a few months (a year?) ago I was getting back into Javascript so I went to the mozilla website [1] to refresh my knowledge of prototypical inheritance. And it seemed all wrong. So I actually ran their short bits of example code in firefox and they worked like I expected them to. Their documentation is just nonsense.

There hasn't been a lot of activity on it recently so it's probably still wrong.

[1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guid...

It's books like this that lead to inventions like Java and Go, with explicit aims of avoiding certain difficult constructs.

Not that pointers are particularly difficult in the scheme of things, but if someone who doesn't understand them tries to teach them, pointers inevitably do become "difficult"

"...But like analyzing a terrible movie that somehow gets made, it's more fun to reason through the “behind the scenes” parts..."

I do too. Many times I find it much more interesting understanding how the sausage is made than what the actual sausage tastes like.

This is a great meta review. As somebody who's just published a book for tech teams, I need to be acutely aware of my own work to make sure I'm not falling down the same hole: taking a little bit of knowledge and and trying to fluff it out to appear to be a comprehensive body of work.

It's not just the 80s and old coding books. There has been quite a trend over the last decade or so of people over-publishing (I guess that's the term). Promising all sorts of things while delivering on little.

In some ways I think this is okay. The received wisdom is that you don't have to know everything, you just have to know more than the reader and be able to explain how to move them a bit forward. Perhaps the key is attitude. Woz says "..'Ive always found the authors come from a position of earnestness, attempting to draw the best conclusions based on decent principles and what they knew at the time they wrote it..."

A little humility, careful scoping, and honesty in a tech book can go a long ways. (Insert long discussion here about whether people would buy such a book, and how people are much more naturally attracted to books with a strong emotional impact "Make money with C Now!" than they are books that simply try to helpfully explain something without all the glitz) There is a natural tension at work here.

It seems that the author followed the same logic as the author of the "English As She Is Spoke"[1] phrase book between Portuguese and English, with the French dictionary in between.


I always wondered about the C tutorials Brian Kernigan mentions in his talk ( https://www.youtube.com/watch?v=8SUkrR7ZfTA ) , many of the examples seem intentionally designed to be incorrect by some trickster spirit.

Now I know the even darker truth.

Thanks for sharing! I didn't know of that video until now, seems like a really good resource on its own and probably worth sharing on HN (as a post).

The exact same code sample appears in the talk as in the article.

Yes, the article mentions the sample was taken from the talk.

This was a fun read, but left me with the question: What book on C pointers would be the polar opposite of this one?

I'd like to read that book.

While I haven't read either of them, Pointers on C and Understanding and Using C Pointers both have good reviews.

Make sure to click the link at the end of the article for code samples with potential for Segmentation fault:core dumped in 4 lines.

Even more concerning is the book seems to have some positive reviews on Amazon(!), and just one shredding it.

The example given at the top of the article can cause a segmentation fault. See the other comments here for how.

Returning out of bounds pointers will do that. Was more impressed by the range of errors in his 4 line examples - the sort of thing you'd expect from a struggling student not a tutor or author. :)

This reminds me of the first textbook I tried to learn C++ from. It was fun to read, because the authors seemed to put more energy into writing Limericks than explaining object-oriented design. I don't think it touched on inheritance, and things like, I don't know, inline functions or templates were not mentioned at all.

It did not even contain working example programs, let alone exercises. It was so bad, as the saying goes, it was not even wrong. I still have that book on my shelf as a reminder to not blindly buy the first/cheapest textbook I can find.

Oh wow, I have the "Going From BASIC to C" book! I remember reading it first because I knew Apple BASIC really well. I also had K&R. Looking back, I wonder if learning C was made harder or easier by that book…

I've found that it's usually those with embedded systems experience who are most knowledgeable about pointers, but I suppose BASIC experence with embedded systems doesn't count --- the ones I'm referring to usually started in Asm/C, and others I know who started in Asm (not necessarily for embedded), also are extremely good at pointer use.

The modern equivalent would probably be Arduino experience. I wonder if there are similar examples in books out there about C++ written by someone with only that...

I took a look at some of the transcribed code examples. Understand, I consider myself a novice at C, one who's just starting to get a clue about pointers. But reading the code examples, more than once I found myself going, "Wait...what?".

I briefly tried to learn C++ an C in the 90s. I'm somewhat glad I didn't find this book in the library. I think it would have made attempting to learn harder, or given me some bad and dangerous habits.

Apparently Mr Traister has written a number of other books - https://www.amazon.com/default/e/B001H6UPHY/ - according to Amazon...

Made me wonder if the quality of the content in the other 11 books is anything like the one the Woz took apart.

It's not Steve Wozniak. They just happen to share the same last name. From https://wozniak.ca:

> I’m Geoff Wozniak, just one of those persons on the Internet. My blog is hosted here, but not much else at the moment.

Thank you!

I have this book, a pristine first edition, with maximum wrongness. It's pristine because I bought it when I started learning C in 1990, and never opened it after my first read through.

Usually, that kind of venom in C book reviews is reserved for books written by Herbert Schildt ;-)

Bad reviews are always so much more fun to read than good ones.

To review your comment: "The author's well meaning and accurate comment is that well-written negative reviews are typically far more fun to read than positive ones of any quality -- in the positive case you usually want to simply read the book, while the implication of the comment is that the pleasure of reading the negative ones is a delicious Schadenfreude.

"Regrettably, this insight was obscured by a regrettable poor choice of terminology ("bad" and "good"). The comments, azernik, has enough HN karma to suggest that this error ought be assigned to the casual, off-the-cuff nature of internet commenting, and that this comment is not up to the usual work (i.e. comments) of the author.

"Regrettably" and "regrettable" is a single sentence, "comments" instead of "commenter", missing "to" after "ought", lowercase "internet"; all that in a single paragraph. Error density in gumby's English is astounding.

Short trumps correct.

Best thing about Fifty Shades of Grey; this review : https://www.goodreads.com/review/show/340987215

WTF is this "for (x=y; ...)" part?

That's just standard C. It was not really part of the WTF.

but y is uninitialized, am I missing something?

Yes. The line right above it:

    y = strlen(r);

Author of book is a Markov chain?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact