Hacker News new | past | comments | ask | show | jobs | submit login
Learn C The Hard Way (learncodethehardway.org)
419 points by mcantelon on July 3, 2011 | hide | past | web | favorite | 244 comments



Ohloh says I've changed at least half million lines of C code (https://www.ohloh.net/accounts/rhp/positions/total) Play me a tiny violin ;-)

What kinda bugs me is that whenever people go to teach C, they make out like it _has_ to be a low-level exercise, as if writing in C suddenly means you can't use abstract data types or object-oriented style or name your functions properly or have Unicode support.

For example, people teach libc string APIs like scanf() and strtok(), which should almost never be used. (See http://vsftpd.beasts.org/IMPLEMENTATION for one take.) Instead, use http://git.gnome.org/browse/glib/tree/glib/gstring.h or write your own like http://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-string....

If you're going to display user-visible text, you are pretty much required to link to GLib or another Unicode library, since libc doesn't have what you need (unless you want to use the old pre-unicode multi-encoding insanity).

Don't use fgets() and other pain like that, use g_file_get_contents() perhaps, or another library. (g_file_get_contents is in http://developer.gnome.org/glib/stable/glib-File-Utilities.h...)

You need help from a library other than libc to deal with portability, internationalization, security, and general sanity.

Maybe more importantly, a library will show you examples that in C you can still use all the good design principles you'd use in a higher-level language.

I told someone to "use a string class" in C recently for example, and they said "C doesn't have classes" - this is confusing syntax with concepts.

C requires more typing and more worrying about memory management, that's all. It doesn't mean that all the best practices you know can be tossed.

There's a whole lot to be said about how to write large, maintainable codebases in C, and it can even be done. It's not something I would or do choose to do these days, but it can be done.

One other thought, two of the highest-profile C codebases, the Linux kernel and the C library, have extremely weird requirements that simply do not apply to most regular programs. However, a lot of people working in C or writing about C have experience with those codebases, and it shows.


The Linux kernel, weird requirements and all, actually represents an excellent example of using C well. It has high-level functions, objects, classes, abstract data types, interfaces, and many other useful abstractions to make C almost comfortable.


Agreed, but it's still sort of a weird case, I think. The considerations related to things like memory allocation, performance, concurrency, internationalization, security, IO, etc. are pretty different in the kernel.


C shouldn't be used for anything other than low level programming these days. There are other higher level programming language that would make your job a whole lot easier. Some programmers, like me, are stuck maintaining legacy C application code, but I wouldn't wish that fate for new programmers looking to learn C as another tool in the belt.


Maybe "low level" is a bit ambiguous.

I certainly agree that C is most appropriate when you are "low in the stack," just above the operating system and maybe implementing something like a virtual machine. I don't make a habit of writing stuff in C for no reason and the vast majority of programming these days isn't and shouldn't be in C (or C++ for that matter).

However, "low in the stack" is different from "low level" like "I refuse to use modern practices" or "I get to omit half the letters from my function names." You can be coding an on-the-metal kind of thing and still think about it in a high level way.


Right, agreed. :)


Other uses of C besides low level programming is:

1. Building static libraries. 2. Building so and dll. 3. Binding those so/dll with higher level languages like Ruby, Python, Perl and PHP.

That is my main use of C these days. I always write the IO, user interface code in higher level languages and put all processing code in a so or dll written in C.


I've done a bit of this in Haskell, and it can be remarkably clean. It can also be a right pain, though.


Yup. Same here.

Having not to deal with exceptions, STL libs, makes life a bit easier.


Of course, in most languages you can use most programming language features by "typing more", but when a certain feature is not provided by the language or a standard library or at least a de facto standard library, everybody will implement it in a slightly different, incompatible way. I assume that is what people mean by "C doesn't have classes".


I'm glad you got so much about how my book is written from a few paragraphs in an unfinished manuscript for it.


You should stop thinking everyone is attacking you. I think you'd be happier.

Really though, there's no content here. The fact it's been voted to #1 on hackernews shows just how bad things are.

Items should be upvoted on their merit, rather than who did them. Someone writing another book about programming C isn't newsworthy.


Someone writing another book about programming C isn't newsworthy

However, "someone who wrote a ground-breaking introductory book in xyz modern-popular-hip language is turning his attention to C, of all things" is newsworthy.


I had to stop reading at "ground breaking". I'll assume you're being sarcastic and leave it at that :)


I don't really understand your comment.

1. The post is about Zed's draft; is it really so far of a leap to interpret hp's remarks as a criticism of the book?

2. Is this a fair paraphrase? "Everyone here isn't attacking you, also, your draft sucks and shouldn't have been posted on HN."


I think hp is making more of a comment about the general state of C books rather than this one.


' Is this a fair paraphrase? "Everyone here isn't attacking you, also, your draft sucks and shouldn't have been posted on HN."'

No, it's not. speckledjim is complaining that someone writing another book on C is not news, he doesn't say anything about the draft sucking.


Implied by "Really though, there's no content here." Maybe they meant the comments, though.


I think he's referring to this topic, rather than the draft.


1. I agree 2. He didn't post it someone else did, and no, I don't think it should have been posted.


If you know a really good beginner-oriented C programming book that is not K&R I would love to see it.



"practical c programming" is not bad: http://amzn.to/m6SOB5


I was just going off on a generic tangent, obviously I don't know what your book will be like. Just talking about C since it's 1am and vaguely on-topic.


He is not criticising you; he's criticising what others that undertook this kind of project have done.


hey, you don't know me or shite, but you're a fucking cool guy. probably the most generous dude i've ever not met. cheers!


Hey, Zed, can you tell us all how much we're losers because we don't get as many blog post hits as you? The irony was a big hit last time.



I always had an idea of a 'for programmers' series of books for those who know how to program in one language (say, C++, Java or PHP) and wish to pick up a new language

eg. 'python for programmers' would not need the first half of it dedicated to explaining strings, loops etc. and could get straight into it from a programmers perspective - a bit like K&R. You could then dedicate more content to explaining philosophy, design decisions, internals, history, politics (learn all the in-jokes ;)), etc.

this would also be a good format to learn new paradigms, eg. 'functional programming in scheme, for programmers'


Indeed, I currently get turned off by most books I pick up, simply because they start out on a too basic general level.

If you however dig a little deeper you can usually find stuff that isn't too tutorialish. Like http://www.c-faq.com/top.html which has been an extremely solid resource for me.


I'm hoping to make this book ramp up faster than LPTHW, since I'm assuming people have either read that book or know one programming language already. However, I'm also a big proponent of practicing the syntax even if you think you're an expert already. It just makes things way easier later on.


Personally, I hope that this book will work for people who don't already know a programming language. C seems like a great first language to learn if only because (if taught well) it exposes a large number of the underlying details of how programs actually run on a system.


Moving from Python to C is a very good move. It allows you to go from programming to inner workings, everyone should have a solid knowledge of C since it give you the tools to identify and correlate aspects of other languages to reality.

I don't understand what "practicing the syntax" means (language barrier?), but I'm not suggesting that I've stopped learning, which I never will.


Different people learn differently. Some people don't like to practice the trivial practice stuff, and prefer to breeze through the beginning, read through new material, and then take on some real project.

Rich Hickey, recently in an interview, said he doesn't do programming exercises - he is not interested in programs which don't make the computer do something useful, interesting, or both.

I, for one, type the trivial examples. Or else, I simply forget how to use them. For example, you have list comprehensions in Python, Racket, F# and if you ask me to do a list comprehension right now without looking up the reference, the only one I will get right is Python. I recently started learning F# and didn't type much code; so the concept is known, but since I didn't practice the syntax, I will have to learn it again.

That's what he means by practicing syntax.


Got it.

I wouldn't however say that the complexities of C is it's syntax (unless you're playing around with macros : ) ). But I guess the concept can be applied broadly, like remembering functions, headers and other language specifics.


how about 'if you are proficient at another language, feel free to skip to chapter x'? and put the generalized stuff at that point


If you really are that proficient, you should be able to figure out what you can skip.


I'm not so sure about that. The devil's in the details - you might think you know something well enough to skip it, but get caught out later.


Sometimes due to language discrepancies, skills don't transfer and there is no easy "feel free to skip to chapter x".

For example, a conversant Python programmer's idea of string, and strings in C are 2 different things. A conversant C programmers idea of looping is different from idiomatic looping in Racket(and other lisps).


actually you are right, it completely doesn't work in that example


"eg. 'python for programmers' would not need the first half of it dedicated to explaining strings, loops etc."

Mark Pilgrim's http://diveintopython.org/ is what you describe.


While not a book, the Ruby website does have a "Ruby from Other Languages" page that seems similar to what you describe.


> this would also be a good format to learn new paradigms

Or you could have books dedicated to a paradigm, such as functional programming, and have examples in many languages that all serve to drive home a specific point (this is how you make something tail-recursive, perhaps), pointing out how each language is both similar and different.

"The Practice of Programming" by Kernighan and Pike is somewhat close to what I want, but is not it.


I know what you mean, it took me multiple books and articles to pick up functional programming. 'Seven Languages in Seven Weeks' was a good book - as is the K&P book you mention, but something complete that can teach an existing non-func programmer from the ground up, possibly using multiple languages, would be great


When I first learned C, I did so under DOS, an environment in which you could declare a pointer to video memory, make a magic call, and start drawing graphics. I found that a memorable way to learn pointers. Sadly, doing that in the modern world requires quite a bit more setup.


You can do something similar now using SDL. In fact, unless you are blitting bitmaps or using additional libraries, it is the only way to get stuff on screen. SDL gives you a pointer that corresponds to the video memory and you stick colors into it.

Here is a tutorial for using SDL for 'old school' graphics programming:

http://sol.gfxile.net/gp/

Of course this would require the student to install SDL and write a makefile to link it all together. I feel that is something worth covering though, since most books just do a bunch of hand waving when it comes to linking external libraries.


When I said "quite a bit more setup", I had SDL in mind. :) Yes, you can get a raw pointer to a graphics buffer via SDL, but in addition to the extra setup to get a window, you have to make extra calls (such as SDL_Flip or SDL_UpdateRect) to make your changes visible.


This is how I learned C as well (demo coding), and a book like that makes a lot of sense IMHO: learning C with computer graphics. Now this would probably be SDL based, but as long as the user receives the initial help to get started with a simple Makefile and the calls to get a window, it will be a lot similar to the old times of int 10h.

Cheers, Salvatore


That's how I was taught assembler. One of our first assignments was to draw a circle on the screen. Second was an etch-a-sketch like drawing programme where you controlled the pixel via kbd. We used int 10h[0] a lot

[0] http://en.wikipedia.org/wiki/INT_10H


Hmmmm, I wonder if you could do that in a similar way...


Zed, your level of productivity is truly inspiring. The best of luck to you.


I think the level of self-promotion and/or other-people-promoting-him is the key factor. Not really the supposed productivity. I know many people who get a lot done, both for day jobs, for contract work, as a hobby, as entreprenurial ventures, etc. but 99.9% never hits the front page of HN on a regular basis like Zed's activity seems to. Which doesn't lessen what he does and his skills, but it does pull back the camera a bit to put it into a broader context. Not everyone self-promotes to the same extent as Mr. Shaw, or has other people promote them.


> Not everyone self-promotes to the same extent as Mr. Shaw, or has other people promote them.

I don't know much about Zed. I've seen his name and articles here on HN, of course. I read you comment and was curious. I found nothing but a URL in his HN profile and zero submissions attributed to him here. From my very brief scan of his comments on HN posts, it seems like they are on topic. He's not hijacking threads.

So here on HN, at least, it appears that others are doing the promoting.

Zed seems to write original essays and he seems to have strong feelings and opinions on topics that interest the hacker crowd. And dude's got a memorable name.

It reminds me of SEO strategy: Write original, compelling articles and you won't need SEO.


I personally have no idea why this stuff hits this site that often. Take this for instance: It is a half-finished manuscript that I announced in a tweet to people who follow me on twitter and asked for it. Already at the top of this set of comments is a dickhead saying he's such a bad ass 'cause he's changed "half a million lines of C code" and he thinks I'm not writing the book correctly because I'm being too low level. He gets all of this from three exercises and three paragraphs in the introduction.

Frankly, I consider hackernews kind of irritating and not useful as a promotional tool. It amounts to about 5% of my traffic for 1 day at best, and I only have to be here because if I'm not answering the petty little idiots who troll here then the rumors spread through my professional circles.

In other words: I could live without this kind of promotion, so it's definitely not "self-promotion".


You're the dick Zed. The guy (hp) barely mentioned what you were doing at all and was just offering some general advice on C. You take it like a personal attack on your manhood.

You're always defending yourself against dreamed up conspiracies. You have a serious martyr complex.


Sure he's a dick, but you know he's a dick.


Why is it that every comment I read by you is either you whining about being treated unfairly or you attacking someone who supposedly "attacked" you? Are you honestly that insecure? Do you read every comment as though it is in some way attempting to degrade your image?

Here you denigrate the HN community, saying it doesn't generate a significant amount of traffic for your site or whatever, and then go on to say you won't respond to the petty "idiots". I see. You're too "professional" for that. Obviously HN is so meaningless to you that you needn't bother with it. Yet you're here, commenting, making an ass out of yourself.

Woe is poor Zed Shaw; always the victim.


A quick look through Zed's comment history tells me that the % of posts he has made that could fit your description is roughly 1%.

So Im going to go with you having a selection bias.

OTOH I do appreciate your vigorous defense of the HN community, too many people just dont care about implied insults to their self identified tribal group.

Go llambda!


You're probably right. I've only seen a couple of his recent comments in threads related to his projects or site (and that thread about the GitHub community). So I no doubt have a selection bias; I suppose it's worth noting his reputation proceeds him, deserved or otherwise.

Nonetheless it's disheartening to see self-proclaimed "professionals" (I use this term loosely for while a person may be employed professionally this is not a guarantee a given individual will behave professionally: they are distinct things) who are well known throughout the community, behave like trolls. And even more so when they decide it's appropriate to take a shit on this community.

Well I will always try my best to preserve what I value. It's in my nature I guess. So thank you. :)


heh. His reputation appears to be a many faceted thing.

I, for instance, have been genuinely impressed with his productivity and his output, his choice of projects and his code and am moved by that aspect of myself that values quality to think very highly of him.

You, on the other hand, appear to feel that the most important aspect of his output is the way he achieves the standards that you have set for him - judged by how he expresses himself towards those who he feels are being disrespectful or rude at him on the internet.

Just out of interest, do you believe that making a strongly negative personal judgment based on a 'fact' that is clearly incorrect and could be easily checked with just a few clicks is the act of a professional? do you meet your own high standards?


First I will call out your blatant fallacy: tu quoque. Now let's review things:

I readily admit I'm a hypocrite. But also note, I don't claim nor ever claimed to be a professional; I'm not. I'm not at the helm of projects that are useful to and used by many people. I haven't written books on the topic of computer science. Nor have I given talks or do I run a website that receives a large volume of traffic. Maybe that qualifies [him] as working in the domain of a professional, yes?

Although he may have achieved great things, this doesn't give him a license to troll HN and it doesn't excuse him of baseless personal attacks. The comments he made that I replied to were no less than that, at best. At worst, they were insight into his character. Certainly we can hope the latter is not true. Nonetheless, there's no place for that kind of bullshit; it's inexcusable, I don't care who you are. I made such a judgement because he clearly attacked the user "hp" who had done nothing more than make an observation about texts that introduce C in general. Zed's reaction was puerile, irrational, and even paranoid. I hope you aren't standing up for such behavior?


"Maybe that qualifies [him] as working in the domain of a professional, yes?"

Either you think he is a professional, or you do not. You appear to be holding up a standard, claiming that you believe he fits the criteria needed to be judged by it, and then lambasting him for not living up to the standard you have set. Overall, I find this somewhat confusing, but possibly I dont have sufficient context to judge.

TBH Im not particularly motivated to 'stand up' for the behavior of anyone I dont know.

I am interested in what makes you so interested in casting judgment on Zed, as opposed to on yourself?

Whatever puerile, irrational and paranoid responses Zed may have produced, you seem to be matching them with a self admitted hypocrisy, a large amount of self righteous vitriol and a bewildering statement of tribal affiliation to an anonymous internet discussion forum that apparently requires your outraged protection lest it collapses completely under the weight of a misunderstanding between Zed and another participant.

Zed has, to my knowledge, done you no harm of any kind. He certainly represents absolutely no realistic threat to HN.

why do you feel justified abusing him in this fashion?



dude, I heard you before.

It doesn't apply in this case. I will leave it as an exercise for the reader to discern why.


Zed Shaw, the Kurt Cobain of programming.


hey, I was just idly chattering about C and the way people in general often approach it. Not intended to be a review of an unwritten book or imply that you plan to approach it in any particular way.

I am a bad ass of course. But I thought it was relevant to the comment that I've written a lot of C.


When your second sentence is: "Play me a tiny violin ;-)" it isn't just idly chattering. It comes across as extremely petty and derogatory.


When somebody directs "derogatory" remarks at themselves, it's called "self-deprecating". It's a good thing in many Western cultures at least, and demonstrates a degree of modesty. In this case, hp made it clear that he doesn't expect sympathy (or applause or whatever) for having written a lot of C. It's just something he mentioned because he thought it was relevant.

It is relevant - people not long out of school and talking at length about everything that's wrong with XYZ are tedious both to experts and experienced people generally. hp's comments on C use experience, clear reasoning, and absolutely no personal commentary whatsoever. This contrasts with the negative, personal and poorly reasoned attacks of people such as you.

I hope for your sake that you're asserting someone is "extremely" petty based on a mere self-deprecating aside because you're only a few years out of school and haven't learnt to communicate yet, and not just an old hand trapped an inability to relate to people with different opinions or role models. The world can always do with more talent and fewer martyrs.


"When somebody directs 'derogatory' remarks at themselves, it's called 'self-deprecating'... In this case, hp made it clear that he doesn't expect sympathy (or applause or whatever) for having written a lot of C. It's just something he mentioned because he thought it was relevant."

This isn't the only possible interpretation of that comment. Sometimes words can have more than one meaning, and misinterpretation isn't always a sign of idiocy or acting in bad faith

Also, when you say, "This contrasts with the negative, personal and poorly reasoned attacks of people such as you," do you mean, "I hope for your sake that ... you're only a few years out of school and haven't learnt to communicate yet, and not just an old hand trapped an inability to relate to people with different opinions or role models"?


"This isn't the only possible interpretation of that comment"

Genuine curiosity... what other interpretation could there be? Is it an ESL issue with literal translation causing offense? Or is there another interpretation in a non North American culture?

In N.A. at least, this is an extremely common expression.


By "that comment", I meant hp's original comment, which was a long tangent on how to go about teaching C, in the context of a post about a draft of a book teaching C. The piece that X-Istence isolated contributes to the interpretation that the comment was a criticism of Shaw, and not just a long tangent. It can be read as setting a confrontational tone.

I don't understand why other commenters feel it was unreasonable of Shaw to interpret it as a criticism.

I think it's been on the order of a year since I last heard this particular violin phrase. It might be common in your community, but that's not exactly North America, yeah?


  I don't understand why other commenters feel it was 
  unreasonable of Shaw to interpret it as a criticism.
To speak for myself: because the other interpretation makes much more sense, if you take into account that people are generally trying to be nice and helpful.

I was reading that comment, thought "hmmm, that criticism sounds a bit premature and more based on what others have written than on what Zed is writing... oh wait, that's because he isn't saying anything about Zed's writing at all. Yeah, now it makes perfect sense."

So admittedly, you get confused for a moment, but then there's a perfectly obvious solution: he's not commenting on the linked article, but is offering advice in the form of criticism on what generally goes wrong with these projects. If you don't see that solution and don't go for that interpretation, I think it means you fail to consider the option that someone is just being clumsy at being helpful. So it's another instance of Hanlon's razor: don't attribute to malice what could equally well be explained by stupidity (which is of course much too strong a term for an awkwardly phrased anwer).

BTW, it's not right that your comments are being downvoted.


Yes, I agree that Shaw didn't apply Hanlon's razor here, and ought to have.


> I personally have no idea why this stuff hits this site that often.

May I clue you in? You create interesting quality projects like Mongrel, Mongrel2, Tir, and LPTHW. You are also entertaining.

If that's self-promotion, let's have more of it.


I’m saddened by the number of grown men I meet who worship guys like the persona found in this rant. Too frequently men (especially younger men) will by default listen to whoever “talks tough” rather than the people who make the most reasoned arguments. They will listen to blow hards and pundits all day and blindly follow their “leadership” on fad after fad, never really questioning whether these people are worth listening to in the first place.

-- Zed Shaw, backpedaling when his angry schtick gets called out by the community


can you name 2? doubt it...


I'll be interested to see how this compares to K&R, which not only teaches the C language but also C idioms and the reasons for using them. K&R is still one of the very best programming books. Every other C book I've read is inferior. Peter van der Linden's "Expert C Programming" is the only book on C besides K&R that I've learned anything from.

Good luck, I'm all for more programmers understanding C but I wonder if the wonderful days of programming close to the hardware are ancient history. "[P]eople are deathly afraid of C thanks to other language inventor's excellent marketing against it." Maybe, but I think the raison d'être for C is not apparent to programmers who started with Java, Python, or Ruby.


K&R won't work for someone beginning programming. This book is specifically targeted towards teaching programming to programming virgins.

The examples and exercises K&R uses will be very hard for beginners. When it builds a recursive descent top down parser to read the declarations in English(and vice-versa), that will be totally lost on the beginners.

K&R wasn't written for beginners and I doubt it will work well for a programming beginner - should be fine for someone who already knows programming in some other language.


Nope, this book isn't for total beginners. I mean, it might work, but I'm actually telling the total newbs to go read http://learnpythonthehardway.org/ first. My thought is teaching "computational thinking" is easier with a language like Python, so people should start there.


I knew time-shared BASIC before learning C in 1978. I was definitely a beginning programmer then, still in high school. Many (probably most) of the programmers I know from the same era learned programming from books that would be dismissed today as too hard for beginners. In the preface to the first edition Kernighan and Ritchie wrote: "This book is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable colleague will help." Of course that was before the internet, when access to a "more knowledgeable colleague" was harder to come by. In 1978 the only other languages someone would know were Fortran, BASIC, COBOL, Pascal, assembly language, etc.

Chapter 1, A Tutorial Introduction starts at the same place Zed's book starts: hello world. Looking at it now I don't see any reason my 13-year-old son (who knows a very little bit of Python) couldn't learn C from K&R with me explaining things here and there -- the same way I learned C. The recursive descent parser doesn't come up until the end of chapter 5 (out of 8), right after quicksort is implemented with pointers to functions. By then the reader has built up some skills and presumably developed enough curiosity to refer to other books for more explanation of sorting and parsers. For me those were Knuth's "The Art of Computer Programming," especially the third volume with all of the great example code which I busily translated to C.

If you think K&R is too hard for beginners consider how many programmers learned programming from the badly-typeset and somewhat inscrutable "Pascal User Manual and Report" (1974). That was the programming 101 text used at the local university when I was there.


After arguing I read the preface to Zed's book. He wrote "LCTHW will not be for beginners, but for people who have at least read LPTHW or know one other programming language." It seems LCTHW addresses the same audience as K&R after all.


Do people really think C is some mysterious, inscrutable language?

"To many programmers, this makes C scary and evil."

Is this actually true for people? I find C code generally very easy and straightforward to understand; there's not any magic behind the scenes, like there is in any language that's more "high level" than C.


Yes, actually people are deathly afraid of C thanks to other language inventor's excellent marketing against it. There's a few things that you need to really nail to be good at C, and they're difficult things, but it's not "dangerous" like people keep claiming.


C is difficult, yes. Dangerous - I don't know in what sense. I manage to get more exceptions in my Python, Ruby code, owing to undefined variables or incorrect types, than I get segfaults in C.

The main issue with C is it takes some time before you are ready to take it head on. An experienced C programmers would have his repertoire of generic data structures library with time complexity guarantees (programs without hashes, expandable lists and operations on them are a pain), will know how to properly use function pointers to do that dependency injection thing other programmers are raving about, separate interface from implementation, know the build environment, know how to use structures and function pointer to build abstractions etc.

But before that, C takes much work to produce little. For someone starting programming, the learning curve is steep. Or more like, the gratification is really, really delayed. It takes some time before he can take on a real world project(it does in the high level languages as well, but the initial progress is faster).


what I keep repeating about that is why the C standard guys don't freaking update the libc with a new version of the C standard? Deprecating the old silly stuff like strcat() (the libc is full of bad calls) and adding lists, hashes, btrees, good dynamic strings lib, and so forth. An huge step forward for C... without even touching the core language.


A bit OT and not sure if you will see this.

Seeing that you are the guy(or onof the guys) behind redis which is written in C, do you have any recommendations for generic data structures and operations on them?

I personally have a trivial vector implementation which resizes when full, and a red-black tree implementation for associative arrays. Both of them work fine for my purpose - does the job, good locality of reference, generic over void*.

I have seen glib but largely neglected it because I only need a very small part of it.


Git is a good place to rip that kind of stuff out of.


"I manage to get more exceptions in my Python, Ruby code, owing to undefined variables or incorrect types, than I get segfaults in C."

What happens when you don't get a clean segfault is what got C the "dangerous" reputation.


If you are developing, `gcc -g` while compiling(or CFLAGS += g), and loading the core with gdb does give a very good idea of what went wrong.


It is a lucky scenario when you actually get a core and one taken at the exactly right time. Unfortunately there are some pointer bugs that are hard to find even with Valgrind.

Buffer overflows do not produce cores, they just sit there until a determined cracker makes use of them. And a dangling pointer might still access memory that looks valid both to OS and memcheck.


The negative connotations of the word aside, it seems perfectly reasonable to call C dangerous. Without that "danger", you couldn't write an OS kernel, a language runtime, or various other low-level code we need on our systems; on the other hand, when writing higher-level code that doesn't need that level of raw access, it seems useful to remove the possibility of a broad class of mistakes.

See also Rusty Russell's rules about "easy to use versus hard to misuse". I find C easy enough to use, but also easy to misuse.


"Danger" minus the negative connotations is best pronounced "power".


gets()?


With great power comes great responsibility. ;)


Many of the friends I made in my CS classes were terrible with pointers. I never really understood why they didn't grasp pointers, but it was a major stumbling block for them in C/C++


    I never really understood why they didn't grasp pointers
The root of the problem is the language designers' loose use of star. "star something" is contained in a phrase that means one thing at declaration, "star something" has a different meaning the rest of the time.

    #include <stdio.h>
    void eg(int i) {
        int *j = &i; // "huh? Put the address of i into *j?"
        *j = *j + 1;
        printf("%d\n", *j);
    }
    int main() {
        eg(4);
    }
With more detail. In the line..

    int *var = something;
.. the system assigns to the pointer. Yet in..

    *var = 6;
.. it assigns to the contents of the pointer.

Common usage creates further room for confusion:

    int *var; // <- this is what people write
    int* var; // <- instead of this
If they'd made the syntax ".int var", and then used * solely for dereferencing, people wouldn't have these problems learning pointers. Consider

    #include <stdio.h>
    void eg(int i) {
        .int j = &i;
        *j = *j + 1;
        printf("%d\n", *j);
    }
    int main() {
        eg(4);
    }
Further confusion comes from (1) special arrangements around string declaration and (2) printf use of %s to expect a string pointer when %d and %f expects (non-pointer) simple int and simple float.

    char* something = "huh? so now this does goe into *something?";


That's because of the (visually) 'wrong' use of * (wrong in the sense that it's unintuitive). The key is that * is part of the type of the declaration, not of the variable; an int* is not an int. So

  int *j = &i;
is more correctly expressed and easier to understand when written like

   int* j = &i;
The only reason to put the * in front of the variable name is when declaring several pointers in one line. So the solution is to only use it that context, or not doing it at all.


This bothers me so much about C/C++ syntax. int* a, b; should clearly declare two int pointers, not one int pointer and one int.

Stroustrup wrote something somewhere where he explained that int* a; is more appropriate for use in C++ because C++ is supposed to be more focused on types, and int * a; is more appropriate for C because of something about C's philosophy, but I can't remember what. I wish they would have changed the syntax for C++, but I guess he couldn't have while still keeping C++ a superset of C.

edit: found it: http://www2.research.att.com/~bs/bs_faq2.html#whitespace

"A ``typical C programmer'' writes ``int p;'' and explains it ``p is what is the int'' emphasizing syntax, and may point to the C (and C++) declaration grammar to argue for the correctness of the style. Indeed, the * binds to the name p in the grammar.

A ``typical C++ programmer'' writes ``int* p;'' and explains it ``p is a pointer to an int'' emphasizing type. Indeed the type of p is int*. I clearly prefer that emphasis and see it as important for using the more advanced parts of C++ well."


To avoid this sort of thing, I use:

    #include <stdio.h>
    void eg(int i) {
        int *j; // "j is a pointer and (hence) *j is an int"
        j = &i; // "Put the address of i into *j? Yup"
        *j = *j + 1;
        printf("%d\n", *j);
    }
    int main() {
        eg(4);
    }

Perhaps this is because I am just used to it, but I really see very little room for confusion here. The common usage avoids confusion, if you do not insist on assignment at the time of declaration.

To address your second confusion, just keep in mind that strings are char arrays and an array's name is actually a pointer. Again, I find this very straightforward.


    I really see very little room for confusion here.
I don't understand how you reach that conclusion. You might understand it, I don't see how you can say there is very little room for confusion.

Yes, if you know about declaration follows use, it makes sense.

Yes, if you "keep in mind that strings are char arrays and an array's name is actually a pointer" then it makes sense.

You can get by by knowing to avoid some constructs.

If you know how the c compiler works, pointers make sense.

If you know C then you know C.

But when you're new to the language, you don't know these things and that's what this part of the thread is discussing.

Another responder wrote:

    The key is that * is part of the type of the
    declaration, not of the variable; an int* is
    not an int.
The grammar is structured as though it's not. Consider this:

    int* c, d;
Since star is part of the type, if the language was designed well then both of them would be int pointers. But in that case, only c is. d is an int. Awful.


>> I really see very little room for confusion here.

> I don't understand how you reach that conclusion. You might understand it, I don't see how you can say there is very little room for confusion.

Please do not put words in my mouth. I wrote "I see very little room for confusion" not "There is very little room for confusion". I tried to make it clear that I was talking about my personal experience; and I was talking about my personal experience since I was hoping it would help, not to defend the syntax of C.

Sometimes a particular point of view allows you to understand something; in some cases it makes the previously mystifying point "trivial" or "obvious". I am sorry that the POV that helped me so much does not help you at all.


I see, and thanks for clarifying.

The point of my earlier post was to describe strong reasons for people to have trouble with pointers.


> if the language was designed well

Do you mean to reinforce that the language is not beginner-friendly, or are you really asserting that makes the language poorly-designed? If it's the latter, you should really at least explore some other factors before making the conclusion. It seems to me that it's a relatively minor distinction once you know it, so from a design perspective that may simply be a tradeoff for some other advantage.


    Do you mean to reinforce that the language is not
    beginner-friendly, or are you really asserting that
    makes the language poorly-designed? 
I was seeking to account to ramidarigaz why his CS classmates didn't understand pointers.

I think poor grammar is poor design - i.e. part of the type affects both variables (int), the other part doesn't (star).

The other issue is use of star to in one place to mean create reference, in others to mean dereference. That was the focus of my first post.

    from a design perspective that may simply be a
    tradeoff for some other advantage.
I've yet to see evidence of any. What sort of things were you thinking about?


If you remember that the experiment here was to make declaration follow use, it makes sense. I consider it a failed experiment, but it does make sense.


I would hazard to guess its due to two things: 1) a lack of understanding about how the machine works, and in particular the way the memory is organized; and 2) lack of explanation of why anyone would need to use pointers. Introducing pointers in the context of a linked list, or explaining call-by-reference would probably help make things more concrete.


I have friends who had to take an intro to C++ as their first (and only) programming class.

I heard them talking about passing arguments to functions as "call by reference" many times, and it was obvious they had no idea what they were talking about, just regurgitating what the instructor said.


K&R chapter 5.5, Character Pointers and Functions, explains pointers and shows how they are used in C. It's probably the clearest explanation of pointers ever written, and it doesn't require any understanding of hardware or assembly language (though that would help).

I remember reading those three pages over and over and experimenting with the code until I got it. I probably spent several days studying the pointer chapter back when I was a teenager. The light bulb eventually went on and C pointers made sense -- until learning C my programming experience was mainly with time-shared BASIC. The stepwise refinement of an indexed array version of strcpy() to a pointer-based version is a masterpiece of programming writing -- I still refer programmers I am mentoring to that chapter.

Finishing the strcpy() example with the one-liner

  while (*s++ = *t++) ;
the authors write "Although this may seem cryptic at first sight, the notational convenience is considerable, and the idiom should be mastered, because you will see it frequently in C programs." I've found that programmers either get that line of code or they don't, and those who don't haven't mastered their craft.


Getting it or not is one thing, but I'd advocate keelhauling anyone who wrote that sort of thing in production.

The "notational convenience" is more obnoxious than anything. But, then, I'm not particularly a fan of terseness for the sake of being terse.


I'm not a fan of wordiness for the sake of being wordy. If you think the code is terse just to show off you are missing the point. There's a difference between writing obscure code and writing concise idiomatic code -- see "The Elements of Programming Style" by Kernighan and Plauger. I'd rather Kernighan, Ritchie, Thompson, Pike, et al. write my production code than someone who thinks C pointer idioms are a punishable offense.

"That sort of thing" has been in production in the C libraries, the UNIX kernel, and all of the brilliant utilities that make up UNIX for over 30 years. It's also very much in production code at Google.

You might want to read Paul Graham's thoughts on succinct code at http://www.paulgraham.com/power.html and Rob Pike's "Notes On Programming in C" at http://doc.cat-v.org/bell_labs/pikestyle.


I've read both of those links in the past, thanks. And while, yes, "idiomatic" (read: terse and annoying) code is in use in all those areas, I don't have to see it.

Code should be pleasant to read wherever possible. That simply is not, to my mind, pleasant to read. It's one step away from Perl line noise (which I avoid, too). You may disagree with this, and that's fine--different strokes for different folks. I have no interest in seeing it in code I have to maintain; you might, and that's OK by me.

(To be fair, however, I have little interest in working with or, god forbid, maintaining C or C++ code under any circumstances. They press the buttons of a group of developers to which I don't belong.)


I imagine terseness was a lot more valuable when everyone was programming on an 80x25 (or smaller) terminal or even punch cards. I've seen a few examples of code like this from early libc implementations of functions like strcpy, malloc, etc. Take a look at the source code of BSD libc -- some of that stuff is historic.

I myself never got much into the terseness game, since apart from a brief stint using gwbasic and later QuickBasic at 80x25, I learned programming using DJGPP in DOS with the RHIDE IDE. The IDE could trick the VGA into displaying something like 132x60, leaving plenty of room for descriptive code.


The Go language tutorial and examples look right at home alongside K&R. Likewise jQuery and other Javascript libraries, where terseness is valued due to bandwidth rather than screen size.

The idiomatic C style is so natural to me now I don't see it as a defect to fix or a game I'm playing. It's how I learned to program in C because I learned from K&R, and if they aren't the authorities I don't know who is. When I see wordy and bloated Java-style code it reminds me of the years I spent writing COBOL. Ultimately, though, it's not efficiency or a desire to show off or confuse other programmers that influences my style. To me code that does what it needs to with no extra fluff is beautiful.

When I see

  while (*s++ = *t++) ;
I know what it does -- I don't need any comments or "descriptive" variable name or an explicit test for a null byte to make it clearer. If a programmer comes across a line of code that he or she doesn't understand the fault may be with the author, but it may be with the reader. In my experience there are a lot of unskilled programmers who quickly decide that any code they look at is badly written and should be thrown out. I don't think I should dumb down my code just so programmers less fluent with C can understand it. I have to consult a dictionary sometimes when I read Cormac McCarthy or Nabokov but I don't think they should write with easier words for my sake.


"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan Not to mention people aren't as smart as you debugging/understanding it. Also that code relies on the order of operations which I wouldn't recommend outside of math because it can be confusing. And it doesn't check that s isn't null.

Java is considered verbose for a number of reasons. It has a number of variable modifiers, it uses long names(which are generally good but can be stupid, especially in some cases of identifying multiple abstraction levels), and because it lacks type inference for generics and collection literals. c frequently makes the opposite mistakes with tiny cryptic or non whole word names. C lacks generic programming, it does do a decent job with initializers but unfortunately those can only be used at initialization.


The point of the example is to explain pointers by starting with a more familiar concept: indexed arrays. It would help if you were referring to the book; the authors note more than once that "the programs are intended to be illustrative, not bullet-proof."

Understanding the precedence and order of evaluation of operators is fundamental to mastery of any programming language (chapter 2.12 in K&R). All C code must "rely on" the order of operations, and if an indirect assignment through a pointer with post-increment is confusing... well that's the point I was making in the first place.


Good quote, but there is nothing particularly clever in the line above. It is in the same league as if (something) or something ||= something_else.


This type of code used to run faster when compilers did less. It makes no difference now. I think that is why it has stuck around. I prefer to write C that uses array references where it makes things clearer.


Don't think so. Kernighan et al. are the first to say code should be clear first and made fast only if it matters. This is clear code. It may (or may not) be faster than an array-based version but that's not the point of the exercise in K&R -- the point is to teach how pointers work.


You'd be surprised what the compiler still doesn't do. For example, when writing a toy high-performance memory allocator for a university class, I was able to gain a significant[1] performance boost by replacing __attribute__((packed)) structs with pointer arithmetic #defines for the allocation block descriptors, with an identical memory layout.

[1] I don't remember the exact gain, but it was at least 10%.


For me, the indirection of pointers was one of the fundamental CS concepts that required real effort to understand.

Before that point, I had never clearly separated the concept of a variable and its value. It took a huge conceptual leap to think about a variable that didn't hold a value, but rather, the location of a value. It took some serious mental gymnastics to deal with pointers n-levels deep.

Mind you, this was actually Perl references, not pointers, so I didn't even have to try to comprehend doing math on them.

After a while, of course, it became second nature.

My favorite aspect of CS is that every so often, I run into a wall of conceptual understanding that requires completely changing how I think in order to move forward. Have you never had moments like this, or were they just different topics?


Actually learning about the stack was one of my big 'aha!' moments. I sorta' knew it existed, but I didn't really comprehend it.


Mainly because variable declarations are "backwards" in C. People try to read code left to right, but C declarations should be read right to left. Pile one things like the dual use of the static keyword, typedefs, and having multiple consts in a declaration figuring out the type of a variable can be a real chore. Dan Saks has a great article that hits on just the issues with const. http://dansaks.com/articles/1998-06%20Placing%20const%20in%2...


Or the classic "clockwise spiral rule" http://c-faq.com/decl/spiral.anderson.html :)


I can't speak for your friends, but while I had zero problems with pointers conceptually, I still had trouble using them in C. The problem wasn't the simple case of declaring a pointer and taking its reference, the difficulty came when the programs got complex and required agreement between function declarations, actual parameters in calls, and use within the functions.


> Do people really think C is some mysterious, inscrutable language?

Yes, I do. But it's not because of the language itself. It's trying to figure out what you can do with it after you grasp the fundamentals. I taught myself C using K&R a long time ago, but I never did anything with it. At the time I figured there were two paths I could progress along - UI related (e.g., a Windows app) or systems related (something Unixy). Both paths presented large hurdles. Nothing insurmountable, but I wasn't a programmer at the time - just doing it for my own edification. I always thought it would be nice if there was a second level book that took you from post-basics to writing something useful.


That's nothing to do with C as a language, but rather to do with relevant APIs, which can be difficult to read at times.


It was for a while with me. I was a more "mathy" CS major which meant that languages that exposed what the computer was doing were more opaque to me... but I'm not sure that I ever considered it "scary and evil."


I don't intend to argue so much as to offer a data point: I suspect that most folks on this site learned C or something C-like early on, and have internalized its modus operandi. I have been learning C recently from a background of functional programming, and I find it scary and evil.

FWIW, I enjoy programming very much, and have built some non-trivial stuff in several languages. Still, I found C extremely taxing. Not in the sense that I felt it was beyond me, but in that I was fighting or recoiling from the language at practically every turn. In what follows, I am acutely aware that I am nowhere near fully fluent in C yet, and am writing only to offer the first impressions of a student. Nevertheless, in that time I have been able to draw on the advice of several "experienced colleagues", as K&R urge. In that sense, if I misstate the facts, I will be repeating the misconceptions of people who have objectively spent an awful lot of time writing C. I would be glad to be corrected on any point, but I'd also find that kind of symptomatic of the issues I have with C.

Firstly, C is incredibly stateful. I would be glad to learn that I am just Doing It Wrong, but there seem to be no obvious way around stateful manipulations as a way of life. Take, for instance, the fact that arrays are essentially second-class citizens and must be stuffed into functions as (pointers to) extra parameters in order to capture the "function's" "output". I am breaking out the scare quotes here since it is an abuse of vocabulary to refer to a procedure that communicates with the world by modifying its inputs as a function. If you have not cut your teeth on it, it seems almost obscurantist. If I want to multiply A by x and store the result in b, I want to write

b = matrixProduct(A,x);

not

matrixProduct(b, A, x);

as I must in C. I go back and forth over whether this is a deliberate and worthwhile performance tradeoff or just myopic design[1], but I don't want to have to settle for this in code I read every day.

There is also a kind of bureaucratic spirit in much C code, resulting from the fact that one must attend so closely to the how of computation rather than the what. In some contexts, like when you're doing distributed simulations that may run for days and performance is at an absolute premium, this emphasis may be appropriate. In most contexts, however, one finds oneself implementing and reimplementing standard operations by hand. Why should it take four lines to sum an array?

One of the most alarming symptoms of this style can be witnessed by watching an experienced C programmer read code. Old hands don't read lines, they scan an entire section of a page at a time. I was awed by this ability until I realized that it is possible only because each line of C does so little. I recall someone (probably in an HN comment) describing the "rhythm" of reading C code. That, to me, is not an encouraging sign.

Then there is, of course, the penance of debugging segfaults and space leaks. What happens when you declare 10 pointers and allocate 9 of them? No one knows, because C doesn't know either. It's up to the compiler. Towers of Hanoi, nasal demons, &c. Even with the help of with smart and experienced people, I've never spent so much time diagnosing such trivial, silent runtime errors. Yes, I know it's that way for a reason, but that reason is not legibility or ease of understanding.

At end, C has performance and a relatively (but not exceptionally) compact semantics. The importance of cycle-squeezing is becoming less of a consideration daily, for reasons well-rehearsed on this site and elsewhere. As for C's semantics, I'd much rather spend my time thinking about the transformations I want to map over my data than orchestrating von Neumann machines to carry those transformations out. If your model of computation is something other than register machines, it's not quite cricket to describe the implementation as magic.

-----------------------------

[1]I'd like to qualify that remark in two ways. Firstly, I certainly recognize the brilliance of K&R for working C from the raw conceptual materials of the time. Secondly language designers in 1969 did not have the benefit of the last 40 years' of object lessons in readability. There are many potential languages--points in language space, if you will--that are semantically identical or near-identical but much more readable than the ansi standard. For an example given by Kernighan himself, postfixing the dereference operator would have done miracles for legibility. Ultimately, a lot of more or less arbitrary choices were made early on and now it's too late to correct them.


C is not the best way to manipulate information, as you have noticed. It's abstractions are all wrong. On the other hand, C is great for computer engineering, as opposed to computer science.

If you are writing firmware, for example, the exact sequence of memory writes is critical. Program the hardware registers in the wrong order, and the device doesn't work. Access the FIFO the wrong way, and your ISR has a data race. Hiding memory access from the programmer is useless when accessing memory correctly is the problem. C is pretty much the only usable language for this kind of work.

The guys writing kernels and system-level libraries face similar issues.

So, C is stateful because the hardware is stateful. C has raw pointers because the hardware has raw pointers. C doesn't manage memory for you because you don't want C to manage memory for you. This all makes sense when you realize that C was invented for writing operating systems.

So, while I do understand your criticism, I think you are looking at this from the wrong angle. The electrical engineering guys build the hardware, and the C guys make the hardware boot up. Fancy functional programming languages are useless without real machines to run on, and C makes those machines go. It's part of the plumbing, just like transistors. Plumbing may be messy and unpleasant, but even the architects designing skyscrapers need to know how it works.


I fully acknowledge that there are domains where C is the best/only tool for the job. I was specifically taking issue with the claim that C is lucid, and neither scary nor evil.


C is only opaque, scary, and evil if you consider computer hardware itself to be opaque, scary, and evil. For someone like an electrical engineer, C is perfectly lucid, safe, and friendly.


Late to that party, but I'll give my $0.02 regardless.

I won't defend C: it's all that you say. I think one of the acknowledged reasons for C's longevity is that it is impedance-matched for UNIX because they grew up together, and for various reasons UNIX is popular with hackers; therefore C is popular. C is deeply rooted within UNIX, partly because the ABI has been so stable for so long. If you want to write a library for UNIX, you generally target C because anything else will run into a quagmire of cross-compiler incompatibility issues. That means all the good libraries on UNIX are written in C or present a C ABI/API. The easiest language to use a C library from is... C. Or C++. So application writers (and tool writers) have often favoured C as well, although the rise of interpreted languages such as Python, Perl, and Ruby has changed that a bit. Also, the C/C++ toolchains have tended to be more advanced than those of other languages.

So, C is still with us, but for reasons that don't have as much to do with its merits as a language as with its merits as a platform (when coupled with UNIX).


If you're thinking of arrays as second class citizens, you're thinking of the language incorrectly. Rather you should realize there are no arrays, only pointers to chunks of memory and arithmetic operations.


There are arrays. C defines types. When you declare a variable as an array of int, C knows that variable is of type "array of int" and treats that differently than if it had been declared as "array of pointers to int" or "array of char" or "pointer to array of pointers to pointers to int".

I spent many years believing it when people made exactly the assertion you have(see my other posts on this article), but it wasn't until I tried to build a C compiler myself how wrong it is to think of arrays this way. Yes, C gives you the power to reference memory in a more or less arbitrary way. That does not mean that the arrays you declare are not arrays.


You're wrong; there ARE arrays: make int a[16], *aa; and compare sizeof(a) with sizeof(aa). The confusion arises from the fact that the VALUE of an array is a pointer to its first element. When you consider that C is a pure pass-by-value language, everything fits nicely into place.


sizeof behavior is just icing over what is really going on. You might as well argue that arrays really exist because we have the [] operator.


No not really. What's really going on is that A has been declared as an array of 16 ints, while aa has been declared as a pointer to int. Those are different types, and C treats them differently sometimes (but not all times).


Aside from the additional information that the C compiler is capable of knowing about in the case of A, it is the same crap going on under the hood.


You're not writing in under-the-hood, though, you are writing in C. If you want to properly understand compiler warnings and errors you need to know that arrays and pointers are separate types.

Yes, "under the hood" it might be the same, but that's true of many languages if you dig deep enough. C is just a little closer. Yes, at one level a char declaration is just a smaller minimum memory allocation than int, but C will check both those types and if you want to use C properly you'll want to understand its typing and casting rules.


Well, QED.


Programming boils down to paying attention to details. C is an extremely small, simple, predictable and malleable [] language that makes you painfully aware of that fact. For me, it is a pleasure to read well-written C code, for example the NetBSD kernel. There's a lot to learn there about elegance and good design.

[] malleable in the sense of bottom-up programming and defining your own "vocabularies". When I develop in C (and C++) I usually solve the problems 75% bottom-up and 25% top-down. With bottom-up approach, it is easy to stay focused at the problem at hand and write mostly bug-free code. The "top-down" bits just put pieces of the puzzle together.


It is possible to program C with a functional mindset, but the syntax does get in the way. The function syntax does not distinguish between parameters that are modified but you can by convention, and use of structs to bundle stuff up. You can pass function pointers liberally. In the end though the number of use cases for C is smaller now, and you should be able to avoid it for large projects and only use it for small pieces.


Actually, C89 can distinguish between parameters that are modifiable and those that are not. The syntax leaves a bit to be desired ("const foo_t x", "foo_t const x" and "const foo_t *const x" all have different semantics) but it can be done.


I haven't perceived that people think C is significantly scary (or evil). There are however many compelling reasons not too choose C in the present RAD/Web/Cloud/Solution Business world.


Given how many people freak out about retain/release memory management in Objective-C, it seems that C, pointers, and manual memory management is gotta be a lot scarier.


C's scheme is much easier for me than objective-C's, mainly because of autorelease.


I've programmed now for 5 years, and I find C hard. But then, I've mainly programmed in C# and Python.


I'm very excited to see Zed working on this. This exercise-driven tutorial format seems efficient and practical, but what makes this really stand out is Zed's approach to the overall goal.

I greatly appreciate an 'opinionated' programming book. I've probably heard more debates on formatting and style for C than any other language.


Oh man, totally looking forward to this. It seems like so many programmers are kind of afraid of C because it's reputation and avoid learning it because they're language of choice is 'more productive.' Which is really unfortunate, I've found having a good amount of C is hugely beneficial to understanding what's going on under the hood when you're using a higher level language, even if you rarely actually program in C itself.

Plus even if you do primarily program in higher level languages, it's a great tool to have in your belt when you need to fix a bug in a library whose bindings you're using in higher level language, or when you legitimately do need to eak a little more performance from a particular piece of code.

Also, love that the book starts by teaching you how to use make as well, so many C books gloss over the tools.


I know a fair amount of C. But I am still looking forward to this book as I hear that LPTHW taught something to even intermediate level people. Besides the exercises are usually good and would like to solve them.



How about having a webpage with recommended libs for newbies to use? Sort of like NodeJS modules page.


I look forward to the content. As a beginning C++ instructor I find there is something lacking between the truth of the language and the conventions for presenting it. Nobody, AFAIK save for the truly hardcore student, has nailed it.


Got any feedback on the things students get wrong? My experience is they fail to grasp memory management, pointers, functions as pointers, linkers or just how a program actually runs.

If you've got others I'd love to hear them.


One of the first programs I tried to write in C++ (my first language) was a chess game. I eventually hacked it together, but it wasn't pretty.

The hardest part for me was learning the patterns necessary to do anything, and specifically to do it remotely well.

For example maybe one of the exercises could be some kind of board game, where you demonstrate how you translate "the ideas" into "the C code"? For example just because you know that a Knight can move two-plus-one spaces doesn't mean you have any idea where that code should be inserted into the program's structure, or how to write it in such a way where you avoid overrunning an array. (The naive solution would crash when the Knight tries to move off the board.)

Translating ideas -> C code was easily my hardest task when I was first starting out.


I've watched and spoken to a large number of beginning C and C++ programmers (and beginning programmers in general) during their first terms of programming in university. I've noticed a few common problems, and I think they all stem from not understanding the details of how a program runs on a real machine:

- The concept of a variable, and variables changing over time, seems quite difficult for people to grasp even when explained a few different ways. "x = 5" followed by "x = 12" proves quite mystifying, and "x = x + 1" even more so. People seem to have the most success with the idea that declaring a variable "int x" creates a location x which can hold an int, and you can put things in that location.

- Pointers actually don't seem to trip that many people up initially, once they get to that point. However, I don't think people actually understand exactly what they do, so much as memorize the rules for dealing with them. The same idea of a location to put something applies here too.

- Any case where the same function gets called more than once often ends up tripping people up; this applies particularly to recursion, but it can happen even when just calling the same function several times. In particular, this often interacts badly with people's understandings of variables. People need some understanding of scope.

- Combining several of the above, it would help to have clear explanations of the interactions between pointers, locations, and functions. Bonus for explaining what goes horribly wrong if a pointer refers to something that goes out of scope. That concept requires understanding several different pieces of C and putting them together.


As a teaching assistant for a class (http://www.cs.cmu.edu/~410/) where students write a whole lot of code in C, please introduce the address-of (&) operator and explain how to obtain pointers without using malloc(). Do this before the concept of the heap is ever introduced at all.

Be careful when explaining the compiler and how a program actually runs. I've found that a lot of student problems come from "the compiler is magic" when it really isn't (maybe related to my other surprised comment below, about how people "don't get C"--they attribute too much magic to the compiler.) Maybe even emphasize that every piece of C code can be translated in a fairly easy fashion to a pretty small amount of assembly.

For the preprocessor, emphasize that it is a solely textual replacement, with no symbolic evaluation. Explain why:

    #define FOO BAR + BAZ
or

    #define MAX(x,y) (((x) < (y)) ?  (y) : (x))
will go horribly wrong (the first in 5*FOO, the second in MAX(x++, y++)).


I find it is much more helpful, when teaching C, to avoid the word 'address', and call (&) the pointer-to operator. Thinking of pointers as numbers (which the address-analogy does) tends to be harmful for newer C programmers because they want to treat them like numbers.


They are numbers (either 32-bit or 64-bit). Nothing more, nothing less. I don't understand why you wouldn't want to think of them that way.

For example...

  const char* current = "ohai thar";
  const char* end     = current + strlen( current );

  assert( end >= current );
  size_t bytecount = (end - current);
(size_t is an unsigned type, so if 'end' is less than 'current', it will overflow. If you want to allow for that, use ptrdiff_t.)


No, they aren't numbers. The standard is quite clear on this. Consider old DOS architecture where you had a section and offset. Your code is also not well-defined C. C only lets you perform subtraction between two pointers in the same block, this code is at best implementation defined and worst undefined (I can't remember).


You'd still need to explain pointer arithmetic somehow.


That is easy since pointer arithmetic is only valid inside a contiguous block. You don't need to talk about it in terms of a pointer being a number at all.


can you get around the second without typeof(a gnu extension)?


at least if you stick to macros


You can't, unfortunately. The point is to be cognizant of the fact that expressions will be evaluated for their effects more than once (maybe). So, don't put effectful things inside a macro expansion.


When learning a new language, the syntax and exercises which explore this come to me fairly quickly. The difficulties are (a) learning the best way of structuring large blocks of code using the new syntax and tools, and (b) for new programmers: understanding how to convert conceptual rules in my head into algorithms.

If you set out to only teach the language syntax and paradigms you are leaving a beginner with a lot of extra work before they can start or contribute to meaningful projects. This is the reason that people learn so much from reading other people's code in open-source projects: most writers skim over trying to teach the most fundamental skill of programming.

I do believe it is possible to teach practicalities in addition to theory and if you attempt to do this, you will be doing a lot more than most writers have done in the past.


My experience as an interviewer showed that in addition to the things you listed, people often fail to grasp binary representation of numbers in a computer at all.

This most often comes up when people are asked to do some binary manipulation of numbers, i.e.:

  unsigned u = 19;
  unsigned v = u >> 1;
"v" is now 9, and to really understand it one must grasp how numbers are represented in binary under the hood.

People also fail to understand strings:

  char* s = "string literal";
To some, it's absolutely opaque that the first byte "s" points to contains 0x73, and that represents "s" in ASCII.


Does C require that you are running on a binary computer?

The C99 standard defines the >> operator in terms of division by powers of 2, so one can determine the result of 19 >> 1 without needing to know anything about how 19 is actually represented by the machine.


> "v" is now 9, and to really understand it one must grasp how numbers are represented in binary under the hood.

Correct me if I am wrong, but I don't think C guarantees anything about the binary representation. Depending on the architecture, `v` can have different value.


C allows differences in how negative integers are represented (at least, C89 did---I'm not sure about C99 as I don't have the standard in front of me). There are three different ways to represent negative numbers in binary, sign magnitude, one's complement, and two's complement (these days, most computers are two's complement). The upshot is that you might end up with two type of zeros (sign magnitude, one's complement) or an unequal range of negative numbers (two's complement has one additional negative number) and taking the absolute value of an integer may not be possible for some values (given a 16-bit integer on a two's complement system, you cannot get the absolute value for -32,768, which is a valid integer on such a system).


If 19 is represented differently in binary on your platform than (leading zeros)10011, you are already quite screwed.

You might be thinking of character representation for the later example.


No, I was thinking binary 19. Does C say the implementation is going to be 1s complements, 2s complements or whatever?

Is there anything that says, for example, it can't be BCD?


Yes even though this is exactly why people want to use C now, to get exact binary behaviour.


The arrow operator and identifiers with leading underscores seem to be glossed over or skipped fairly often in education. It's intimidating to look at code that uses them if you don't know what they are.


Identifiers with leading underscores? I use those for member variables... (this makes more sense in C++ without the arrow notation.)


Maybe he's referring to how identifiers with leading underscores followed by a capital letter are reserved in C++ (I'm not sure if they are in C)...


Oho. Followed by a capital letter? Interesting.

I always thought that "all identifiers with a leading underscore were reserved". I just consciously ignored it, and have never had a problem in years. But I was also always using member variable names like "_children", "_childCount", etc, not "_Children".


In fact, here's a reference: http://msdn.microsoft.com/en-us/library/e7f8y25b(v=vs.80).as... (It's in a "Microsoft Specific" block but says it's part of the ANSI C standard.) It applies to C, and it also applies to identifiers staring with two underscores.


The C Standard reserves identifiers starting with a leading underscore for the system implementation (C standard library, Posix libraries, etc). They aren't meant for use by user code (or rather, you can use them, but they might conflict with system defined identifiers).


The first problem is the notion of absolute precision and detail required to do even the simplest task. The next is how syntax relates to such notions. Then comes mapping the syntax to the task. If they don't grok variables at this point, they're not going anywhere. A lot don't. Above all is the imperative of completion: if they don't make it work somehow, it doesn't work period. I'll try to salvage their work, tell them what to fix, and accept resubmissions - but if it doesn't work, it inherently won't pass the low bar for success.


I'm not sure if this is going to be pure C or a mix of C/C++, but if you're sticking to pure-C, here's a chapter idea: using containers in C.

This is the big selling point for C++ : the convenience of STL. In all the projects that I worked, it was one of the important reasons to choose C++ over C.

Yes, this focuses less on the language itself and more on the ecosystem around it, but I figured if you have an entire chapter dedicated to "make", then this makes sense as well.


Too often we think of C in terms of commands. C is a matter of _operators_. Consider "=", say in the statement

y=1+x=3;

Why and how this works is foreign to most beginners. Likewise (ok, this is C++ but the point remains)

cout<<1+x;

Isn't a command to display 1+x. The output is the result of inserting the computed value into couture. There is a semantic difference.

Groking operators early is key to understanding C well.


In my opinion the thing that many new C developers have trouble with is how large the gap between what their implementation does can be from what the standard says. I think newer C programmers are used to languages that specify a lot more than C does. Try It And See is generally not a useful response for a C question.


I was taught C at Uni, but this was after having done a semester of x86 assembly language so we already had a good grounding in pointers, how addresses work etc. This was our textbook http://oreilly.com/catalog/9780937175231


dedicate some later chapters to ramping the reader onto assembler, the stack (and smashing it) and instructions

there are a lot of C devs who don't understand what it is their code is producing, and how an application and memory are managed


While this is important, it probably shouldn't be in beginner C book.

I understand all of it now, but had no idea about the stack or the x86 instructions, etc for years. I was still productive, and wasn't hindered.

The details can come later. Even big details like "how it works at the low level".

The most important part is to keep things fun; LPTHW was fun. For me, mucking about in assembly wasn't, and that sentiment seems like it might be common among new C programmers. Assembly has a way of slowly steamrolling your motivation.


imho, please do have a look at "expert c programming" it is quite nice. the conversational style of the book makes it very enjoyable to read.


using gdb?


Has anyone actually used "Learn Python the Hard Way" in a beneficial way? I started it several times but never got very far and just learned Python from other sources. I didn't really care for the approach.


Was it your first book? If not, you weren't in the target audience from what I make out of the contents of the book.

For people who already know programming, the official tutorial is the fastest and the best way to start: http://docs.python.org/tutorial/index.html


Yes. I found it useful maybe 70% of the time--more than most texts.

The critical factor, IMHO, is clarity and conciseness of writing. Zed does a good job with this (I didn't know his name until this thread, BTW).


Why is %d used to interpolate an integer into a format string? I don't recall seeing this done for any reason before.

I hope Zed actually covers how dangerous format strings can be if not handled properly. Format strings are still (hilariously) one of the major exploitation vectors in C-based applications today.

Edit: According to Wikipedia, %i and %d are synonymous. Sorry for the confusion.


It's not for the type, but for the output. The 'd' means "decimal" output. For example, you can do %x to "hexadecimal" output, and %o for octal. Of course, there's some complex interplay with what type you hand that function.

As for how "dangerous" it is, yeah it's not going to rape your family, it'll just crash. So I'll be showing people how to prevent it.


> it's not going to rape your family, it'll just crash

Well, if it hits undefined behavior it could in fact rape your family, the standard allows for that.


Oh right, I forgot they added that to the C6.283185307179586 standard.


Well, undefined means the compiler is able to generate any code it pleases, includig, but not limited to, segfaulting, invoking NetHack, or raping your family.


I'm amused you took the time to put in 2pi there.


I put tau in there, get it right.


Google bid pi billion for the patents.

Intelligent people support Tau: http://tauday.com/ This is the least Zed could do.


It'll crash and potentially allow code execution if the rest of the args are user coercible. Check out the first few chapters of The Shellcoder's Handbook for examples.

Thanks for the explanation. I've never seen/had to use %d with integers. I've only used it with %.Nd for floats/doubles.


code/ex3.c:6: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘double’

You should probably read this book. :-)


Hmm, I don't recall MSVC++ ever warning about that! Unfortunately, I did use Windows at one point in my career.

I'll pass on reading it since I don't intend to write C ever again, but I hope the book works out well for you. :)

Edit: Just dug up some of my old C code. It is indeed %.2f that I was thinking of. Sorry for the confusion once again.


Leaving it at "crash and don't do it" is enough for a beginner C book.

Mentioning code execution and shell code isn't really in the scope. If he mentions code execution, then it sort of warrants mentioning modern architecture prevent executing data as code, or the code segment is not writable on many architectures...and shell code is basically the op code that your machine executes, and injecting shell code in absence of any protecting mechanism will execute arbitrary code, and in presence of protection mechanism, it will crash.

I think just mentioning shell codes and code execution aren't doing a beginner any good. And explaining it is well out of scope. It's not that this is going to be the end all C book, and as long as the reader sticks to using proper format strings, he is good. If he doesn't, knowing about what might happen isn't doing much good either.


> as the reader sticks to using proper format strings, he is good.

Another important point to mention is that the format string itself should not be user coercible! XCode/LLVM/whatever Apple is using nowadays actually treats non-constant format strings as a compilation error, which is pretty cool.


d is for dinteger, obviously.

Edit: Actually, I finally remembered why I always use %d, and why I find seeing %i odd. While %d and %i may be the same for output with printf, they are different for input with scanf. (Also I'm not sure that %i was part of the C89/90 standard for printf, it may have been a compiler specific thing (another reason to avoid); it's defined in C99 but I don't have a copy of the 89/90 pdf.)


d for decimal?


Right, but I've never seen it used with integers. %.2d is probably the most common format I've seen. Messing up your format strings is a common exploitation vector, so I was wondering if he had typo'd that or otherwise used it accidentally.


You mean %.2f ? %d is for integers, %f is for reals (yo).


Oh I'm totally using that.


I much prefer using %g for the general scenario.


I hope this book will show lots of new C programmers the beauty of pointers in hand-holdy detail ('cause that's the level needed, I feel). Maybe basic stuff about VM, so all the pointer operations make sense (everything is a byte at some offset etc.).

At least I didn't fully appreciate C until I understood some of the underlying concepts.

Looking forward to reading this!


Having read a fair bit of LPTHW (though I had no actual interest in the language), I am really excited about this.

One of the best things about LPTHW was the context it was written in, and if LCTHW is written in the same way, it should be a really awesome read!


That's the idea, although I'm going to assume you know how to use at least one programming language, and probably redirect total newbies to LPTHW if they don't. That should make it easier to ramp up in C and then do cool useful stuff with it.


That's awesome to hear. I actually know a bit of C, but it has been years since I used it - so this book seems like a good one to read to 'reboot' my C skills.

I'll definitely be waiting anxiously for you to finish.

Good luck!


I jokingly made this suggestion to him on twitter. He posted a few "assignments" and we (I assume it was more than myself participating) posted pics of our console output.

This guy loves to program



Nice! Where's the rest?


The biggest problem with C is not the language C, for it is a small and mostly simple language with a few warts (I'm looking at you, pointer syntax), it is the ecosystem into which you are thrust when you first use it.

That is, the ecosystem of, "What can I include without dicking around with compiler and linker settings, which I do not care to learn very well because I am just starting?", and the ecosystem of, "Why are all these standard libraries full of functions that all the documentation tells me not to use?", and most importantly, the ecosystem of, "Oh, this looks like a nice library that would make my life easy, (and later), wait, why isn't my program working on this other machine? I copied the binary over? Wait, what's this about a missing so? Oh, that's the library I installed, wait, how do I put it the same directory? Oh my god so many configuration settings! Wait, why can it still not find the library? It's right there now! What's LD_LIBRARY_PATH? LD_LIBRARY_PATH is bad? Why doesn't -R work? Oh, that's only for Solaris? What's the equivalent for friggin' Linux?! Ah, rpath, wait... it's trying to find the ABSOLUTE path? UGGARRHGHGHAAHHHH! Okay, finally, $ORIGIN... now let me just put that in the make file like they said I should.... AHGHGHGHGHGHGHHHHHHHHHHHHHHH!!!!!!!"

Which is to say, the ecosystem of fucking ratholes that have built up over 40 years of poor tool design that cannot be corrected now due to historical precedent.


> The biggest problem with C is not the language C, for it is a small and mostly simple language with a few warts (I'm looking at you, pointer syntax), it is the ecosystem into which you are thrust when you first use it.

The language C is a big problem for beginners, though. Pointer syntax is not just a tricky optional feature, it's necessary for a number of common tasks including defining functions and passing parameters. The type system is also important, not intuitive and rarely taught effectively. Countless times I read or was told that the syntax for referencing arrays was the same as referencing a pointer in memory, but nobody ever bothered to clarify or reinforce the idea that arrays are still a distinct static type.

Pointers and types are fundamentals and anyone who was lucky enough to learn them early on might not understand how hard it can be to figure this stuff out on your own and how difficult it is to actually use C before you do.


"It's not the language, honey, it's the runtime." (apologies to Indiana Jones)


"It's not the despair, Laura. I can handle the despair. It's the hope I can't stand." (No apologies at all to John Cleese or to anybody else.)


C++ has the same issues.

To add another example, binary only distributions (e.g. some commercial software) bring their dependencies with them, meaning that you have to use roughly the same environment (compiler version, stl) to use them.

Libraries work a lot better in VM languages.


It doesn't take a VM language or an interpreted lanugage to get libraries working well. Take a look at Haskell's Cabal system and Hackage package library. It works pretty good and doesn't use a VM.

With C and C++, you'll have to use your operating system's package management to get all the important libraries (or build them by hand from Git sources, etc), then have a build system that configures the build environment and searches for all the libraries and other dependencies. It's not as nice as using a dedicated tool for this, like Gem and Bundler in Ruby but usually you get the job done - unless you work on Windows and don't have a package manager.


> Oh my god so many configuration settings!

There are many alternatives for you that offer 1click build/deployment.


There are so many, in fact, that choosing one becomes a confusing array of choices in itself.


Use cmake, no fuck this, try JAM. WTF? Why not simple Makefiles, better yet manually written projects for XCODE, VCPROJ.... Oh fuck no.... At the end of the day you wrote something to get your shit out. Simple as that :)

Just google for "msvcrt_win2000.obj" and see the madness (yes, it's about using MSVCRT.DLL instead of later MSVC libs, and still get your shit working on 2000 or XP). I did that just last few hours :)


If you statically link, will it only run on the latest platform? I don't believe I've ever shipped an app that dynamically linked to the MSVCRT.


It can, and that's what I did. But I decided to experiment, and there is some little benefit - less memory usage, and supposedly malloc() from one place can be freed() by another (or I think so). A lot of Microsoft products are still linking to it. At work we don't care and link to whatever latest MSVC provides, but for certain products to write plugins we have to use the exact same Visual Studio version (for example Autodesk Maya or MotionBuilder).

There are way too many subtle details (and more complicated with Windows's manifests, side by side assemblies and crap like that).


Hah! Exactly.

There is one that I can recommend, for the record, at least for C++:

Qt's QMake.



The last project I tried that used qmake missed dependencies on generated files, and therefore needed make clean fairly often.

It's sad, but the best user experiences I've had were either hand-coded makefiles, or autotools (ick).


I have the "learncthehardway.com" domain, waiting for Zed to stop being so insanely busy so that I can do the hand-off. :)


I'm skeptical that it will be an alternative to K&R 2nd ed but nonetheless I'm interested in the result.


Isn't "C For Programmers" essentially "HOWTO Use Pointers"? Is there another really complicated idea that C has which most of its heirs do not have?


Strings, and why strlen(s) != the number of letters in a string (UTF-8).

Why a string must end with a 0 byte.

Binary shifting numbers, especially signed numbers.

Memory management; it's fun for the whole family, and more than just knowing how to allocate memory and store a pointer to it.

Some other points that I'm sure I'm missing.


> Why a string must end with a 0 byte.

Unless it's a pstring or you're working with assembly, other language, etc.

But yes there is a lot to learn with C you don't necessarily get with higher level languages in most cases.


It should say "Learn C and make the hard way".


Oh, there's a lot more tools on the way. C is one language that you could spend whole books on just the tools.


May be a good idea to mention some of them. `valgrind` and `lint` comes to mind. On second thoughts, `lint` can be dropped; compiler warnings are good enough for a beginner(in fact, good enough for experienced programmers; subjective).


The original lint tools seem entirely superseded by modern compiler warnings. On the other hand, a whole new class of static analysis tools exist now, such as Sparse, Coccinelle, Frama-C, and in the proprietary world Coverity.


The problem with C isn't C. It's shitty C programmers.

Learn how the language works, follow good patterns, and expect some bumps in the road.

Most of the time, though, you can use a different higher level language, and be a happier and less stressed programmer


Go Zed! I'll buy this when it's ready for payment!



Actually, that looks like a pretty easy way to learn C; a whole semester on it, beginning with things like using ssh? With instructors and TAs?

But ooh, Columbia, so you know it's hardkore.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: